scispace - formally typeset
Search or ask a question

Showing papers by "Andrew Zisserman published in 2000"


Book
01 Jan 2000
TL;DR: In this article, the authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly in a unified framework, including geometric principles and how to represent objects algebraically so they can be computed and applied.
Abstract: From the Publisher: A basic problem in computer vision is to understand the structure of a real world scene given several images of it. Recent major developments in the theory and practice of scene reconstruction are described in detail in a unified framework. The book covers the geometric principles and how to represent objects algebraically so they can be computed and applied. The authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly.

15,558 citations


Journal ArticleDOI
TL;DR: A new robust estimator MLESAC is presented which is a generalization of the RANSAC estimator which adopts the same sampling strategy as RANSac to generate putative solutions, but chooses the solution that maximizes the likelihood rather than just the number of inliers.

2,267 citations


Journal ArticleDOI
TL;DR: An algebraic representation is developed which unifies the three types of measurement and permits a first order error propagation analysis to be performed, associating an uncertainty with each measurement.
Abstract: We describe how 3D affine measurements may be computed from a single perspective view of a scene given only minimal geometric information determined from the image This minimal information is typically the vanishing line of a reference plane, and a vanishing point for a direction not parallel to the plane It is shown that affine scene structure may then be determined from the image, without knowledge of the camera's internal calibration (eg focal length), nor of the explicit relation between camera and world (pose) In particular, we show how to (i) compute the distance between planes parallel to the reference plane (up to a common scale factor)s (ii) compute area and length ratios on any plane parallel to the reference planes (iii) determine the camera's location Simple geometric derivations are given for these results We also develop an algebraic representation which unifies the three types of measurement and, amongst other advantages, permits a first order error propagation analysis to be performed, associating an uncertainty with each measurement We demonstrate the technique for a variety of applications, including height measurements in forensic images and 3D graphical modelling from single images

760 citations


BookDOI
01 Jan 2000
TL;DR: This paper proposes an experimental comparison of several different stereo algorithms, using real imagery, and explores two different methodologies, with different strengths and weaknesses.
Abstract: While many algorithms for computing stereo correspondence have been proposed, there has been very little work on experimentally evaluating algorithm performance, especially using real (rather than synthetic) imagery. In this paper we propose an experimental comparison of several different stereo algorithms. We use real imagery, and explore two different methodologies, with different strengths and weaknesses. Our first methodology is based upon manual computation of dense ground truth. Here we make use of a two stereo pairs: one of these, from the University of Tsukuba, contains mostly fronto-parallel surfaces; while the other, which we built, is a simple scene with a slanted surface. Our second methodology uses the notion of prediction error, which is the ability of a disparity map to predict an (unseen) third image, taken from a known camera position with respect to the input pair. We present results for both correlation-style stereo algorithms and techniques based on global methods such as energy minimization. Our experiments suggest that the two methodologies give qualitatively consistent results. Source images and additional materials, such as the implementations of various algorithms, are available on the web from http://www.research.microsoft.com/ ̃szeliski/stereo.

346 citations


Proceedings ArticleDOI
05 Oct 2000
TL;DR: A markerless camera tracking system for augmented reality that operates in environments which contain one or more planes, which is a common special case, which it is shown significantly simplifies tracking.
Abstract: We describe a markerless camera tracking system for augmented reality that operates in environments which contain one or more planes. This is a common special case, which we show significantly simplifies tracking. The result is a practical, reliable, vision-based tracker. Furthermore, the tracked plane imposes a natural reference frame, so that the alignment of the real and virtual coordinate systems is rather simpler than would be the case with a general structure-and-motion system. Multiple planes can be tracked, and additional data such as 2D point tracks are easily incorporated.

330 citations


Journal Article
TL;DR: A brief overview of the use of feature-based methods in structure and motion computation can be found in this paper, where a companion paper by Irani and Anandan [16] reviews direct methods.
Abstract: This report is a brief overview of the use of “feature based” methods in structure and motion computation. A companion paper by Irani and Anandan [16] reviews “direct” methods.

288 citations


Journal ArticleDOI
TL;DR: Multi-view relationships are developed for lines, conics and non-algebraic curves using the homography induced by this plane for transfer from one image to another in a projective reconstruction of imaged curves.
Abstract: This paper describes the geometry of imaged curves in two and three views. Multi-view relationships are developed for lines, conics and non-algebraic curves. The new relationships focus on determining the plane of the curve in a projective reconstruction, and in particular using the homography induced by this plane for transfer from one image to another. It is shown that given the fundamental matrix between two views, and images of the curve in each view, then the plane of a conic may be determined up to a two fold ambiguity, but local curvature of a curve uniquely determines the plane. It is then shown that given the trifocal tensor between three views, this plane defines a homography map which may be used to transfer a conic or the curvature from two views to a third. Simple expressions are developed for the plane and homography in each case. A set of algorithms are then described for automatically matching individual line segments and curves between images. The algorithms use both photometric information and the multiple view geometric relationships. For image pairs the homography facilitates the computation of a neighbourhood cross-correlation based matching score for putative line/curve correspondences. For image triplets cross-correlation matching scores are used in conjunction with line/curve transfer based on the trifocal geometry to disambiguate matches. Algorithms are developed for both short and wide baselines. The algorithms are robust to deficiencies in the segment extraction and partial occlusion. Experimental results are given for image pairs and triplets, for varying motions between views, and for different scene types. The methods are applicable to line/curve matching in stereo and trinocular rigs, and as a starting point for line/curve matching through monocular image sequences.

173 citations


Proceedings ArticleDOI
03 Sep 2000
TL;DR: Two estimators suitable for the enhancement of text images are proposed: a maximum a posteriori (MAP) estimator based on a Huber prior and an estimator regularized using the total variation norm, which demonstrates the improved noise robustness of these approaches over the Irani and Peleg estimator.
Abstract: The objective of this work is the super-resolution enhancement of image sequences. We consider in particular images of scenes for which the point-to-point image transformation is a plane projective transformation. We first describe the imaging model, and a maximum likelihood (ML) estimator of the super-resolution image. We demonstrate the extreme noise sensitivity of the unconstrained ML estimator. We show that the Irani and Peleg (1991, 1993) super-resolution algorithm does not suffer from this sensitivity, and explain that this stability is due to the error back-projection method which effectively constrains the solution. We then propose two estimators suitable for the enhancement of text images: a maximum a posteriori (MAP) estimator based on a Huber prior and an estimator regularized using the total variation norm. We demonstrate the improved noise robustness of these approaches over the Irani and Peleg estimator. We also show the effects of a poorly estimated point spread function (PSF) on the super-resolution result and explain conditions necessary for this parameter to be included in the optimization. Results are evaluated on both real and synthetic sequences of text images. In the case of the real images, the projective transformations relating the images are estimated automatically from the image data, so that the entire algorithm is automatic.

161 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the idea of grouping together features that satisfy a geometric relationship can be used, both for (automatic) detection and estimation of vanishing points and lines.

138 citations


01 Jan 2000
TL;DR: In this article, a method for automatically reconstructing a 3D piecewise planar model from multiple images of a scene is described, using inter-image homographies to validate and best estimate planar facets.
Abstract: A new method is described for automatically reconstructing a 3D piecewise planar model from multiple images of a scene. The novelty of the approach lies in the use of inter-image homographies to validate and best estimate planar facets, and in the minimal initialization requirements — only a single 3D line with a textured neighbourhood is required to generate a plane hypothesis. The planar facets enable line grouping and also the construction of parts of the wireframe which were missed due to the inevitable shortcomings of feature detection and matching. The method allows a piecewise planar model of a scene to be built completely automatically, with no user intervention at any stage, given only the images and camera projection matrices as input. The robustness and reliability of the method are illustrated on several examples, from both aerial and interior views.

117 citations


Book ChapterDOI
26 Jun 2000
TL;DR: This paper extends the recovery of structure and motion to image sequences with several independently moving objects, where Euclidean reconstruction becomes possible in the multibody case, when it was underconstrained for a static scene.
Abstract: This paper extends the recovery of structure and motion to image sequences with several independently moving objects. The motion, structure, and camera calibration are all a-priori unknown. The fundamental constraint that we introduce is that multiple motions must share the same camera parameters. Existing work on independent motions has not employed this constraint, and therefore has not gained over independent static-scene reconstructions. We show how this constraint leads to several new results in structure and motion recovery, where Euclidean reconstruction becomes possible in the multibody case, when it was underconstrained for a static scene. We show how to combine motions of high-relief, low-relief and planar objects. Additionally we show that structure and motion can be recovered from just 4 points in the uncalibrated, fixed camera, case. Experiments on real and synthetic imagery demonstrate the validity of the theory and the improvement in accuracy obtained using multibody analysis.

Journal ArticleDOI
TL;DR: A method for 3D segmentation from voxel data which combines statistical classification and geometry-driven segmentation is described and how the partial volume effect is estimated and object measurements are obtained is discussed.

Book ChapterDOI
26 Jun 2000
TL;DR: An m view n ≥ 6 point robust reconstruction algorithm which uses the 6 point method as a search engine and extends the successful RANSAC based algorithms for 2-views and 3-views to m views.
Abstract: The paper has two main contributions: The first is a set of methods for computing structure and motion for m ≥ 3 views of 6 points. It is shown that a geometric image error can be minimized over all views by a simple three parameter numerical optimization. Then, that an algebraic image error can be minimized over all views by computing the solution to a cubic in one variable. Finally, a minor point, is that this "quasi-linear" linear solution enables a more concise algorithm, than any given previously, for the reconstruction of 6 points in 3 views. The second contribution is an m view n ≥ 6 point robust reconstruction algorithm which uses the 6 point method as a search engine. This extends the successful RANSAC based algorithms for 2-views and 3-views to m views. The algorithm can cope with missing data and mismatched data and may be used as an efficient initializer for bundle adjustment. The new algorithms are evaluated on synthetic and real image sequences, and compared to optimal estimation results (bundle adjustment).

Proceedings ArticleDOI
01 Sep 2000
TL;DR: The objective of this paper is to estimate the orientation of a scene plane from an uncalibrated perspective image under the assumption that the scene is coated with a homogeneous (but unknown) texture, and an algorithm is given which is applicable to both regular and irregular textures.
Abstract: The objective of this paper is to estimate the orientation of a scene plane from an uncalibrated perspective image under the assumption that the scene is coated with a homogeneous (but unknown) texture. We make the following novel contributions: first, we show that the problem is equivalent to estimating the vanishing line of the plane; second, we show that estimating the two degrees of freedom of this line can be decomposed into two searches each for one parameter; third, we give an algorithm for this estimation which is applicable to both regular and irregular textures. The algorithms do not require that texels are identified explicitly. But once the plane vanishing line has been obtained, then texels locations can be determined, and the geometry of the scene plane computed up to an affine transformation. We give examples of these computations on real images.

Book ChapterDOI
01 Jan 2000
TL;DR: A novel approach to reconstructing the complete surface of an object from multiple views, where the camera circumnavigates the object, which combines the information available from the apparent contour with the information Available from the imaged surface texture.
Abstract: We describe a novel approach to reconstructing the complete surface of an object from multiple views, where the camera circumnavigates the object. The approach combines the information available from the apparent contour with the information available from the imaged surface texture.

Book ChapterDOI
26 Jun 2000
TL;DR: It is shown that the affine and Euclidean calibrations involve quadratic constraints and an algorithm to solve them based on a conic intersection technique is described.
Abstract: This paper describes a method for autocalibrating a stereo rig. A planar object performing general and unknown motions is observed by the stereo rig and, based on point correspondences only, the autocalibration of the stereo rig is computed. A stratified approach is used and the autocalibration is computed by estimating first the epipolar geometry of the rig, then the plane at infinity Π∞ (affine calibration) and finally the absolute conic Ω∞ (Euclidean calibration). We show that the affine and Euclidean calibrations involve quadratic constraints and we describe an algorithm to solve them based on a conic intersection technique. Experiments with both synthetic and real data are used to evaluate the performance of the method.

Book
01 Jan 2000
TL;DR: A General Method for Feature Matching and Model Extraction and Characterizing the Performance of Multiple-Image Point-Correspondence Algorithms using Self-Consistency are studied.
Abstract: Correspondence and Tracking.- An Experimental Comparison of Stereo Algorithms.- A General Method for Feature Matching and Model Extraction.- Characterizing the Performance of Multiple-Image Point-Correspondence Algorithms Using Self-Consistency.- A Sampling Algorithm for Tracking Multiple Objects.- Real-Time Tracking of Complex Structures for Visual Servoing.- Geometry and Reconstruction.- Direct Recovery of Planar-Parallax from Multiple Frames.- Generalized Voxel Coloring.- Projective Reconstruction from N Views Having One View in Common.- Point- and Line-Based Parameterized Image Varieties for Image-Based Rendering.- Recovery of Circular Motion from Profiles of Surfaces.- Optimal Reconstruction.- Optimization Criteria, Sensitivity and Robustness of Motion and Structure Estimation.- Gauge Independence in Optimization Algorithms for 3D Vision.- Uncertainty Modeling for Optimal Structure from Motion.- Error Characterization of the Factorization Approach to Shape and Motion Recovery.- Bootstrapping Errors-in-Variables Models.- Invited Talks.- Annotation of Video by Alignment to Reference Imagery.- Computer-Vision for the Post-production World: Facts and Challenges through the REALViZ Experience.- Special Sessions.- About Direct Methods.- Feature Based Methods for Structure and Motion Estimation.- Discussion for Direct versus Features Session.- Bundle Adjustment - A Modern Synthesis.- Discussion for Session on Bundle Adjustment.- Summary of the Panel Session.

Book ChapterDOI
01 Jan 2000
TL;DR: A method to completely automatically recover 3D scene structure together with a camera for each frame from a sequence of images acquired by an unknown camera undergoing unknown movement is described.
Abstract: We describe a method to completely automatically recover 3D scene structure together with a camera for each frame from a sequence of images acquired by an unknown camera undergoing unknown movement. Previous approaches have used calibration objects or landmarks to recover this information, and are therefore often limited to a particular scale. The approach of this paper is far more general, since the “landmarks” are derived directly from the imaged scene texture. The method can be applied to a large class of scenes and motions, and is demonstrated here for sequences of interior and exterior scenes using both controlled-motion and hand-held cameras.