scispace - formally typeset
Search or ask a question

Showing papers on "Real image published in 2002"


Book ChapterDOI
28 May 2002
TL;DR: This paper invests how a combination of image invariants, covariants, and multiple view relations can be used in concord to enable efficient multiple view matching and produces a matching algorithm which is linear in the number of views.
Abstract: There has been considerable success in automated reconstruction for image sequences where small baseline algorithms can be used to establish matches across a number of images. In contrast in the case of widely separated views, methods have generally been restricted to two or three views.In this paper we investigate the problem of establishing relative viewpoints given a large number of images where no ordering information is provided. A typical application would be where images are obtained from different sources or at different times: both the viewpoint (position, orientation, scale) and lighting conditions may vary significantly over the data set.Such a problem is not fundamentally amenable to exhaustive pair wise and triplet wide baseline matching because this would be prohibitively expensive as the number of views increases. Instead, we investiate how a combination of image invariants, covariants, and multiple view relations can be used in concord to enable efficient multiple view matching. The result is a matching algorithm which is linear in the number of views.The methods are illustrated on several real image data sets. The output enables an image based technique for navigating in a 3D scene, moving from one image to whichever image is the next most appropriate.

670 citations


Book ChapterDOI
Jordi Freixenet1, X. Munoz1, David Raba1, Joan Martí1, Xavier Cufí1 
28 May 2002
TL;DR: This paper reviews different segmentation proposals which integrate edge and region information and highlights 7 different strategies and methods to fuse such information.
Abstract: Image segmentation has been, and still is, a relevant research area in Computer Vision, and hundreds of segmentation algorithms have been proposed in the last 30 years. However, it is well known that elemental segmentation techniques based on boundary or region information often fail to produce accurate segmentation results. Hence, in the last few years, there has been a tendency towards algorithms which take advantage of the complementary nature of such information. This paper reviews different segmentation proposals which integrate edge and region information and highlights 7 different strategies and methods to fuse such information. In contrast with other surveys which only describe and compare qualitatively different approaches, this survey deals with a real quantitative comparison. In this sense, key methods have been programmed and their accuracy analyzed and compared using synthetic and real images. A discussion justified with experimental results is given and the code is available on Internet.

556 citations


Journal ArticleDOI
TL;DR: Here exploiting pixel intensity proved to be more beneficial than exploiting the details of image chromaticity statistics, and the three-dimensional (3-D) gamut-mapping algorithms gave the best performance.
Abstract: For pt.I see ibid., vol. 11, no.9, p.972-84 (2002). We test a number of the leading computational color constancy algorithms using a comprehensive set of images. These were of 33 different scenes under 11 different sources representative of common illumination conditions. The algorithms studied include two gray world methods, a version of the Retinex method, several variants of Forsyth's (1990) gamut-mapping method, Cardei et al.'s (2000) neural net method, and Finlayson et al.'s color by correlation method (Finlayson et al. 1997, 2001; Hubel and Finlayson 2000). We discuss a number of issues in applying color constancy ideas to image data, and study in depth the effect of different preprocessing strategies. We compare the performance of the algorithms on image data with their performance on synthesized data. All data used for this study are available online at http://www.cs.sfu.ca//spl sim/color/data, and implementations for most of the algorithms are also available (http://www.cs.sfu.ca//spl sim/color/code). Experiments with synthesized data (part one of this paper) suggested that the methods which emphasize the use of the input data statistics, specifically color by correlation and the neural net algorithm, are potentially the most effective at estimating the chromaticity of the scene illuminant. Unfortunately, we were unable to realize comparable performance on real images. Here exploiting pixel intensity proved to be more beneficial than exploiting the details of image chromaticity statistics, and the three-dimensional (3-D) gamut-mapping algorithms gave the best performance.

400 citations


Journal ArticleDOI
TL;DR: An algorithm for reconstructing three-dimensional structure and motion causally, in real time from monocular sequences of images is described and it is proved that the algorithm is minimal and stable, in the sense that the estimation error remains bounded with probability one throughout a sequence of arbitrary length.
Abstract: We describe an algorithm for reconstructing three-dimensional structure and motion causally, in real time from monocular sequences of images. We prove that the algorithm is minimal and stable, in the sense that the estimation error remains bounded with probability one throughout a sequence of arbitrary length. We discuss a scheme for handling occlusions (point features appearing and disappearing) and drift in the scale factor. These issues are crucial for the algorithm to operate in real time on real scenes. We describe in detail the implementation of the algorithm, which runs on a personal computer and has been made available to the community. We report the performance of our implementation on a few representative long sequences of real and synthetic images. The algorithm, which has been tested extensively over the course of the past few years, exhibits honest performance when the scene contains at least 20-40 points with high contrast, when the relative motion is "slow" compared to the sampling frequency of the frame grabber (30 Hz), and the lens aperture is "large enough" (typically more than 30/spl deg/ of visual field).

359 citations


Journal ArticleDOI
TL;DR: A fast and reliable stereo matching algorithm which produces a dense disparity map by using fast cross correlation, rectangular subregioning (RSR) and 3D maximum-surface techniques in a coarse-to-fine scheme.
Abstract: This paper presents a fast and reliable stereo matching algorithm which produces a dense disparity map by using fast cross correlation, rectangular subregioning (RSR) and 3D maximum-surface techniques in a coarse-to-fine scheme. Fast correlation is achieved by using the box-filtering technique whose speed is invariant to the size of the correlation window and by segmenting the stereo images into rectangular subimages at different levels of the pyramid. By working with rectangular subimages, not only can the speed of the correlation be further increased, the intermediate memory storage requirement can also be reduced. The disparity map for the stereo images is found in the 3D correlation coefficient volume by obtaining the global 3D maximum-surface rather than simply choosing the position that gives the local maximum correlation coefficient value for each pixel. The 3D maximum-surface is obtained using our new two-stage dynamic programming (TSDP) technique. There are two original contributions in this paper: (1) development of the RSR technique for fast similarity measures and (2) development of the TSDP technique for efficiently obtaining 3D maximum-surface in a 3D volume. Typical running time of our algorithm implemented in the C language on a 512 × 512 image is in the order of a few seconds on a 500 MHz PC. A variety of synthetic and real images have been tested, and good results have been obtained.

256 citations


Journal ArticleDOI
TL;DR: An algorithm for the calibration of a paracatadioptric device using only the images of lines in space is presented and it is shown that all of the intrinsic parameters from theImages of only three lines are obtained and that this is possible without any metric information.
Abstract: Catadioptric sensors refer to the combination of lens-based devices and reflective surfaces. These systems are useful because they may have a field of view which is greater than hemispherical, providing the ability to simultaneously view in any direction. Configurations which have a unique effective viewpoint are of primary interest, among these is the case where the reflective surface is a parabolic mirror and the camera is such that it induces an orthographic projection and which we call paracatadioptric. We present an algorithm for the calibration of such a device using only the images of lines in space. In fact, we show that we may obtain all of the intrinsic parameters from the images of only three lines and that this is possible without any metric information. We propose a closed-form solution for focal length, image center, and aspect ratio for skewless cameras and a polynomial root solution in the presence of skew. We also give a method for determining the orientation of a plane containing two sets of parallel lines from one uncalibrated view. Such an orientation recovery enables a rectification which is impossible to achieve in the case of a single uncalibrated view taken by a conventional camera. We study the performance of the algorithm in simulated setups and compare results on real images with an approach based on the image of the mirror's bounding circle.

210 citations


Journal ArticleDOI
01 Sep 2002
TL;DR: A framework based on the visual servoing approach well known in robotics is proposed, which features simplicity, accuracy, efficiency, and scalability in order to achieve real‐time augmented reality applications.
Abstract: This paper presents a framework to achieve real-time augmented reality applications. We propose a framework based on the visual servoing approach well known in robotics. We consider pose or viewpoint computation as a similar problem to visual servoing. It allows one to take advantage of all the research that has been carried out in this domain in the past. The proposed method features simplicity, accuracy, efficiency, and scalability wrt. to the camera model as well as wrt. the features extracted from the image. We illustrate the efficiency of our approach on augmented reality applications with various real image sequences.

196 citations


Journal ArticleDOI
01 Jun 2002
TL;DR: A novel approach called the "one-circle" algorithm for measuring the eye gaze using a monocular image that zooms in on only one eye of a person, showing the possibility of finding the unique eye gaze direction from a single image of one eye.
Abstract: There are two components to the human visual line-of-sight: pose of human head and the orientation of the eye within their sockets. We have investigated these two aspects but will concentrate on eye gaze estimation. We present a novel approach called the "one-circle" algorithm for measuring the eye gaze using a monocular image that zooms in on only one eye of a person. Observing that the iris contour is a circle, we estimate the normal direction of this iris circle, considered as the eye gaze, from its elliptical image. From basic projective geometry, an ellipse can be back-projected into space onto two circles of different orientations. However, by using a geometric constraint, namely, that the distance between the eyeball's center and the two eye corners should be equal to each other, the correct solution can be disambiguated. This allows us to obtain a higher resolution image of the iris with a zoom-in camera, thereby achieving higher accuracies in the estimation. A general approach that combines head pose determination with eye gaze estimation is also proposed. The searching of the eye gaze is guided by the head pose information. The robustness of our gaze determination approach was verified statistically by the extensive experiments on synthetic and real image data. The two key contributions are that we show the possibility of finding the unique eye gaze direction from a single image of one eye and that one can obtain better accuracy as a consequence of this.

187 citations


Journal ArticleDOI
TL;DR: An energy based approach to estimate a dense disparity map from a set of two weakly calibrated stereoscopic images while preserving its discontinuities resulting from image boundaries using an energy minimization approach.

184 citations


Journal ArticleDOI
TL;DR: The theory and practice of self-calibration of cameras which are fixed in location and may freely rotate while changing their internal parameters by zooming is described and some near-ambiguities that arise under rotational motions are identified.
Abstract: In this paper we describe the theory and practice of self-calibration of cameras which are fixed in location and may freely rotate while changing their internal parameters by zooming. The basis of our approach is to make use of the so-called infinite homography constraint which relates the unknown calibration matrices to the computed inter-image homographies. In order for the calibration to be possible some constraints must be placed on the internal parameters of the camera. We present various self-calibration methods. First an iterative non-linear method is described which is very versatile in terms of the constraints that may be imposed on the camera calibration: each of the camera parameters may be assumed to be known, constant throughout the sequence but unknown, or free to vary. Secondly, we describe a fast linear method which works under the minimal assumption of zero camera skew or the more restrictive conditions of square pixels (zero skew and known aspect ratio) or known principal point. We show experimental results on both synthetic and real image sequences (where ground truth data was available) to assess the accuracy and the stability of the algorithms and to compare the result of applying different constraints on the camera parameters. We also derive an optimal Maximum Likelihood estimator for the calibration and the motion parameters. Prior knowledge about the distribution of the estimated parameters (such as the location of the principal point) may also be incorporated via Maximum a Posteriori estimation. We then identify some near-ambiguities that arise under rotational motions showing that coupled changes of certain parameters are barely observable making them indistinguishable. Finally we study the negative effect of radial distortion in the self-calibration process and point out some possible solutions to it.

178 citations


Journal ArticleDOI
TL;DR: The robust cost functions and the associated hierarchical minimization techniques that are proposed mix efficiently non-parametric (dense) representations, local interacting parametric representations, and global non-interacting parametric representation related to a partition into regions.
Abstract: In this paper we present a comprehensive energy-based framework for the estimation and the segmentation of the apparent motion in image sequences. The robust cost functions and the associated hierarchical minimization techniques that we propose mix efficiently non-parametric (dense) representations, local interacting parametric representations, and global non-interacting parametric representations related to a partition into regions. Experimental comparisons, both on synthetic and real images, demonstrate the merit of the approach on different types of photometric and kinematic contents ranging from moving rigid objects to moving fluids.

Book ChapterDOI
28 May 2002
TL;DR: The approach integrates color analysis and multibaseline stereo in a synergistic manner to yield accurate separation and depth, as demonstrated by the results on synthetic and real image sequences.
Abstract: Specular reflections present difficulties for many areas of computer vision such as stereo and segmentation. To separate specular and diffuse reflection components, previous approaches generally require accurate segmentation, regionally uniform reflectance or structured lighting. To overcome these limiting assumptions, we propose a method based on color analysis and multibaseline stereo that simultaneously estimates the separation and the true depth of specular reflections. First, pixels with a specular component are detected by a novel form of color histogram differencing that utilizes the epipolar constraint. This process uses relevant data from all the stereo images for robustness, and addresses the problem of color occlusions. Based on the Lambertian model of diffuse reflectance, stereo correspondence is then employed to compute for specular pixels their corresponding diffuse components in other views. The results of color-based detection aid the stereo correspondence, which determines both separation and true depth of specular pixels. Our approach integrates color analysis and multibaseline stereo in a synergistic manner to yield accurate separation and depth, as demonstrated by our results on synthetic and real image sequences.

Journal ArticleDOI
TL;DR: A new method to automatically compute, reorient, and recenter the mid-sagittal plane in anatomical and functional three-dimensional (3-D) brain images, which is fast and accurate, even for strongly tilted heads, and even in presence of high acquisition noise and bias field.
Abstract: We present a new method to automatically compute, reorient, and recenter the mid-sagittal plane in anatomical and functional three-dimensional (3-D) brain images. This iterative approach is composed of two steps. At first, given an initial guess of the mid-sagittal plane (generally, the central plane of the image grid), the computation of local similarity measures between the two sides of the head allows to identify homologous anatomical structures or functional areas, by way of a block matching procedure. The output is a set of point-to-point correspondences: the centers of homologous blocks. Subsequently, we define the mid-sagittal plane as the one best superposing the points on one side and their counterparts on the other side by reflective symmetry. Practically, the computation of the parameters characterizing the plane is performed by a least trimmed squares estimation. Then, the estimated plane is aligned with the center of the image grid, and the whole process is iterated until convergence. The robust estimation technique we use allows normal or abnormal asymmetrical structures or areas to be treated as outliers, and the plane to be mainly computed from the underlying gross symmetry of the brain. The algorithm is fast and accurate, even for strongly tilted heads, and even in presence of high acquisition noise and bias field, as shown on a large set of synthetic data. The algorithm has also been visually evaluated on a large set of real magnetic resonance (MR) images. We present a few results on isotropic as well as anisotropic anatomical (MR and computed tomography) and functional (single photon emission computed tomography and positron emission tomography) real images, for normal and pathological subjects.

Proceedings ArticleDOI
03 Nov 2002
TL;DR: In this article, a combination of principal component analysis (PCA) and multiscale pyramid decomposition (MDC) is used to estimate the local orientation of an image.
Abstract: This paper presents an image local orientation estimation method, which is based on a combination of two already well-known techniques: the principal component analysis (PCA) and the multiscale pyramid decomposition. The PCA analysis is applied to find the maximum likelihood (ML) estimate of the local orientation. The proposed technique is shown to enjoy excellent robustness against noise. We present both simulated and real image examples to demonstrate the proposed technique.

Proceedings ArticleDOI
24 Jun 2002
TL;DR: This work casts the segmentation problem as the maximization of the mutual information between the region labels and the image pixel intensities, subject to a constraint on the total length of the region boundaries, and forms the problem based on nonparametric density estimates.
Abstract: We present a novel information theoretic approach to image segmentation. We cast the segmentation problem as the maximization of the mutual information between the region labels and the image pixel intensities, subject to a constraint on the total length of the region boundaries. We assume that the probability densities associated with the image pixel intensities within each region are completely unknown a priori, and we formulate the problem based on nonparametric density estimates. Due to the nonparametric structure, our method does not require the image regions to have a particular type of probability distribution, and does not require the extraction and use of a particular statistic. We solve the information-theoretic optimization problem by deriving the associated gradient flows and applying curve evolution techniques. We use fast level set methods to implement the resulting evolution The evolution equations are based on nonparametric statistics, and have an intuitive appeal. The experimental results based on both synthetic and real images demonstrate that the proposed technique can solve a variety of challenging image segmentation problems.

Patent
Yukinobu Ishino1, Tadashi Ohta1
18 Mar 2002
TL;DR: In this article, a position detector displays a target on a given plane and adds a standard on the given plane in the vicinity of the target with the location of the standard known.
Abstract: A position detector displays a target on a given plane and adds a standard on the given plane in the vicinity of the target with the location of the standard known. An image of the given plane is formed on an image plane of an image sensor with an image of the standard included, a point in the image of the given plane which is formed at a predetermined position of the image plane corresponding to the point to be detected. An image processor identifies the image of the standard on the image plane to calculate the position of the point to be detected. The standard includes asymmetric pattern. The standard includes a first standard and a second standard sequentially added on the given plane, the difference being calculated accompanied with the plus sign or the minus sign. The image on the given plane is formed by means of a scanning, the image sensor reads out the sensed image upon the termination of at least one period of the scanning. The second standard is added upon the initiation of the scanning after the completion of reading out of the image of the first standard.

Book ChapterDOI
28 May 2002
TL;DR: A complete approach is proposed that detects the problem and defers the computation of parameters that are ambiguous in projective space (i.e. the registration between partial reconstructions only sharing a common plane and poses of cameras only seeing planar features) till after self-calibration.
Abstract: In this paper we address the problem of uncalibrated structure and motion recovery from image sequences that contain dominant planes in some of the views. Traditional approaches fail when the features common to three consecutive views are all located on a plane. This happens because in the uncalibrated case there is a fundamental ambiguity in relating the structure before and after the plane. This is, however, a situation that is often hard to avoid in man-made environments. We propose a complete approach that detects the problem and defers the computation of parameters that are ambiguous in projective space (i.e. the registration between partial reconstructions only sharing a common plane and poses of cameras only seeing planar features) till after self-calibration. Also a new linear self-calibration algorithm is proposed that couples the intrinsics between multiple subsequences. The final result is a complete metric 3D reconstruction of both structure and motion for the whole sequence. Experimental results on real image sequences show that the approach yields very good results.

Journal ArticleDOI
TL;DR: A wavelet decomposition approach for rotation-invariant template matching, which uses the ring-projection transform, which is invariant to object orientation, to represent an object pattern in the detail subimage.

Book ChapterDOI
28 May 2002
TL;DR: This paper develops a novel representation of human motion using low-dimensional spatio-temporal models that are learned using motion capture data of human subjects and proves suitable for initializing more complex probabilistic models of humanmotion.
Abstract: This paper proposes a solution for the automatic detection and tracking of human motion in image sequences. Due to the complexity of the human body and its motion, automatic detection of 3D human motion remains an open, and important, problem. Existing approaches for automatic detection and tracking focus on 2D cues and typically exploit object appearance (color distribution, shape) or knowledge of a static background. In contrast, we exploit 2D optical flow information which provides rich descriptive cues, while being independent of object and background appearance. To represent the optical flow patterns of people from arbitrary viewpoints, we develop a novel representation of human motion using low-dimensional spatio-temporal models that are learned using motion capture data of human subjects. In addition to human motion (the foreground) we probabilistically model the motion of generic scenes (the background); these statistical models are defined as Gibbsian fields specified from the first-order derivatives of motion observations. Detection and tracking are posed in a principled Bayesian framework which involves the computation of a posterior probability distribution over the model parameters (i.e., the location and the type of the human motion) given a sequence of optical flow observations. Particle filtering is used to represent and predict this non-Gaussian posterior distribution over time. The model parameters of samples from this distribution are related to the pose parameters of a 3D articulated model (e.g. the approximate joint angles and movement direction). Thus the approach proves suitable for initializing more complex probabilistic models of human motion. As shown by experiments on real image sequences, our method is able to detect and track people under different viewpoints with complex backgrounds.

Proceedings ArticleDOI
01 Jan 2002
TL;DR: A new automatic approach is described that extends a classical 2D image blending technique to a 3D surface, which produces high quality photo-realistic results at a low computational cost.
Abstract: This paper describes a novel system for building seamless texture maps for a surface of arbitrary topology from real images of the object taken with a standard digital camera and uncontrolled lighting. In our application we wish to take a sparse set of real images of a 3D object, and apply the images to an approximate surface model of the object to generate a high quality textured model. In practice the measured colour and intensity for a surface element observed in different photographic images will not agree. This is due to the interaction between real world lighting effects (such as highlights and specularities) and variations in the camera gain settings as well as registration and surface modelling errors. We describe a new automatic approach that extends a classical 2D image blending technique to a 3D surface, which produces high quality photo-realistic results at a low computational cost.

Patent
Adrian Travis1
01 Aug 2002
TL;DR: In this paper, a flat liquid-crystal screen which ejects light from the plane at a selectable line is used to display a 3D image. But, in the case of a three-dimensional display, each video projectors, each projecting an image as seen at a slightly different angle, combine to constitute a threedimensional display which produces a three dimensional image that is one line high.
Abstract: A video display for two or three dimensions has a flat liquid-crystal screen which ejects light from the plane at a selectable line. One or, in the case of a 3-D display, several video projectors project a linear image into the plane from an edge. A complete image is written on the screen by addressing the line with appropriate images as it is scanned down the screen. To screen a three-dimensional image, the video projectors, each projecting an image as seen at a slightly different angle, combine to constitute a three-dimensional display which produces a three-dimensional image that is one line high.

Journal ArticleDOI
TL;DR: A novel and robust statistic as a similarity measure for robust image registration is proposed, named as increment sign correlation, which is formalized to be a binary distribution or a Gaussian distribution for a large image size through statistical analysis and modeling.

Journal ArticleDOI
TL;DR: A new approach for estimating and tracking three-dimensional pose of a human face from the face images obtained from a single monocular view with full perspective projection, which is more robust than the existing feature-based approaches for face pose estimation.

Book ChapterDOI
28 May 2002
TL;DR: This work presents a method to recover a dense optical flow field map from two images, while explicitely taking into account the symmetry across the images as well as possible occlusions and discontinuities in the flow field.
Abstract: Traditional techniques of dense optical flow estimation don't generally yield symmetrical solutions: the results will differ if they are applied between images I1 and I2 or between images I2 and I1 In this work, we present a method to recover a dense optical flow field map from two images, while explicitely taking into account the symmetry across the images as well as possible occlusions and discontinuities in the flow field The idea is to consider both displacements vectors from I1 to I2 and I2 to I1 and to minimise an energy functional that explicitely encodes all those properties This variational problem is then solved using the gradient flow defined by the Euler-Lagrange equations associated to the energy In order to reduce the risk to be trapped within some irrelevant minimum, a focusing strategy based on a multi-resolution technique is used to converge toward the solution Promising experimental results on both synthetic and real images are presented to illustrate the capabilities of this symmetrical variational approach to recover accurate optical flow

Patent
09 Dec 2002
TL;DR: In this paper, a camera for imaging the periphery of a vehicle is provided, images imaged by the camera are stored, and a processed image obtained by processing a past image which has been imaged before reaching a current position is overlaid on a part of a current image, which is currently imaged, being a blind spot region by being hidden by a car body or the like so that a combination image is created.
Abstract: A camera for imaging the periphery of a vehicle is provided, images imaged by the camera are stored, and a processed image obtained by processing a past image which has been imaged before reaching a current position is overlaid on a part of a current image, which is currently imaged, being a blind spot region by being hidden by a car body or the like so that a combination image is created. A contour line symbolizing the car body is combined with the combination image and the image is displayed.

Patent
02 May 2002
TL;DR: In this paper, the authors proposed a display device which can provide both a slot game having realism and a colorful and rich display by mechanical reels, which is shown by superimposing in the entire region or a part of the region of the virtual image display region.
Abstract: An object of the invention is to provide a display device which can provide both a slot game having realism and a colorful and rich display by mechanical reels. A superimposed display device according to the present invention comprises an image display section disposed in a position to face a player side, mechanical reels disposed on a line which intersects another line connecting the image display section and the player side, a lighting section for illuminating the mechanical reels, a half mirror which is inclined in a plane including a region where the above lines intersect, in which virtual images of designs on peripheries of the mechanical reels are shown in the same plane to a player, and images are shown by superimposing in the entire region or a part of the region of the virtual image display region.

Journal ArticleDOI
TL;DR: A set of neighborhoods are introduced to generate dynamical coupling structures associated with a specific oscillator to make the segmentation network robust to noise on an image, and unlike image processing algorithms no iterative operation is needed for noise removal.

Journal ArticleDOI
TL;DR: The discrete singular convolution (DSC) algorithm for edge detection is introduced and the performance of the proposed algorithm is compared with many other existing methods, such as the Sobel, Prewitt and Canny detectors.

Journal ArticleDOI
TL;DR: This paper introduces the human hand model with 27 degrees of freedom (DOFs) and analyzes some of its constraints to reduce the 27 to 12 DOFs without any significant degradation of performance.

Book ChapterDOI
28 May 2002
TL;DR: A completely automatic method for obtaining the approximate calibration of a camera (alignment to a world frame and focal length) from a single image of an unknown scene, provided only that the scene satisfies a Manhattan world assumption.
Abstract: We present a completely automatic method for obtaining the approximate calibration of a camera (alignment to a world frame and focal length) from a single image of an unknown scene, provided only that the scene satisfies a Manhattan world assumption This assumption states that the imaged scene contains three orthogonal, dominant directions, and is often satisfied by outdoor or indoor views of man-made structures and environmentsThe proposed method combines the calibration likelihood introduced in [5] with a stochastic search algorithm to obtain a MAP estimate of the camera's focal length and alignment Results on real images of indoor scenes are presented The calibrations obtained are less accurate than those from standard methods employing a calibration pattern or multiple images However, the outputs are certainly good enough for common vision tasks such as tracking Moreover, the results are obtained without any user intervention, from a single image, and without use of a calibration pattern