scispace - formally typeset
Search or ask a question

Showing papers by "Andrew Zisserman published in 2002"


Book ChapterDOI
28 May 2002
TL;DR: This paper invests how a combination of image invariants, covariants, and multiple view relations can be used in concord to enable efficient multiple view matching and produces a matching algorithm which is linear in the number of views.
Abstract: There has been considerable success in automated reconstruction for image sequences where small baseline algorithms can be used to establish matches across a number of images. In contrast in the case of widely separated views, methods have generally been restricted to two or three views.In this paper we investigate the problem of establishing relative viewpoints given a large number of images where no ordering information is provided. A typical application would be where images are obtained from different sources or at different times: both the viewpoint (position, orientation, scale) and lighting conditions may vary significantly over the data set.Such a problem is not fundamentally amenable to exhaustive pair wise and triplet wide baseline matching because this would be prohibitively expensive as the number of views increases. Instead, we investiate how a combination of image invariants, covariants, and multiple view relations can be used in concord to enable efficient multiple view matching. The result is a matching algorithm which is linear in the number of views.The methods are illustrated on several real image data sets. The output enables an image based technique for navigating in a 3D scene, moving from one image to whichever image is the next most appropriate.

670 citations


Book ChapterDOI
28 May 2002
TL;DR: This paper presents a novel method of classifying a single image without requiring any a priori knowledge about the viewing or illumination conditions under which it was photographed, and argues that using rotationally invariant filters while clustering in such a low dimensional space improves classification performance.
Abstract: In this paper we present a new approach to material classification under unknown viewpoint and illumination. Our texture model is based on the statistical distribution of clustered filter responses. However, unlike previous 3D texton representations, we use rotationally invariant filters and cluster in an extremely low dimensional space. Having built a texton dictionary, we present a novel method of classifying a single image without requiring any a priori knowledge about the viewing or illumination conditions under which it was photographed. We argue that using rotationally invariant filters while clustering in such a low dimensional space improves classification performance and demonstrate this claim with results on all 61 textures in the Columbia-Utrecht database. We then proceed to show how texture models can be further extended by compensating for viewpoint changes using weak isotropy.The new clustering and classification methods are compared to those of Leung and Malik (ICCV 1999), Schmid (CVPR 2001) and Cula and Dana (CVPR 2001), which are the current state-of-the-art approaches.

396 citations


Book ChapterDOI
28 May 2002
TL;DR: It is demonstrated that convincing texture mapped models are generated which include the main walls and roofs, together with inset windows and also protruding (dormer) roof windows, and the performance of this technique is assessed.
Abstract: We investigate a strategy for reconstructing of buildings from multiple (uncalibrated) images In a similar manner to the Facade approach we first generate a coarse piecewise planar model of the principal scene planes and their delineations, and then use these facets to guide the search for indentations and protrusions such as windows and doors However, unlike the Facade approach which involves manual selection and alignment of the geometric primitives, the strategy here is fully automaticThere are several points of novelty: first we demonstrate that the use of quite generic models together with particular scene constraints (the availability of several principal directions) is sufficiently powerful to enable successful reconstruction of the targeted scenes Second, we develop and refine a technique for piecewise planar model fitting involving sweeping polygonal primitives, and assess the performance of this technique Third, lines at infinity are constructed from image correspondences and used to sweep planes in the principal directionsThe strategy is illustrated on several image triplets of College buildings It is demonstrated that convincing texture mapped models are generated which include the main walls and roofs, together with inset windows and also protruding (dormer) roof windows

248 citations


Journal ArticleDOI
TL;DR: This paper provides a statistical estimation framework to quantify PVE and to propagate voxel-based estimates in order to compute global magnitudes, such as volume, with associated estimates of uncertainty.

184 citations


Book ChapterDOI
28 May 2002
TL;DR: It is demonstrated that the faces of the principal cast of a feature film can be generated automatically using clustering with appropriate invariance, and the affine invariant measure introduced may be obtained in closed form.
Abstract: We develop a distance metric for clustering and classification algorithms which is invariant to affine transformations and includes priors on the transformation parameters. Such clustering requirements are generic to a number of problems in computer vision.We extend existing techniques for affine-invariant clustering, and show that the new distance metric outperforms existing approximations to affine invariant distance computation, particularly under large transformations. In addition, we incorporate prior probabilities on the transformation parameters. This further regularizes the solution, mitigating a rare but serious tendency of the existing solutions to diverge. For the particular special case of corresponding point sets we demonstrate that the affine invariant measure we introduced may be obtained in closed form.As an application of these ideas we demonstrate that the faces of the principal cast of a feature film can be generated automatically using clustering with appropriate invariance. This is a very demanding test as it involves detecting and clustering over tens of thousands of images with the variances including changes in viewpoint, lighting, scale and expression.

160 citations


Book ChapterDOI
28 May 2002
TL;DR: This paper presents a multiple view algorithm for computing the alpha matte using a Bayesian framework, which enables virtual objects to be added between the foreground and background layers, and gives examples of this augmentation to the original sequences.
Abstract: When estimating foreground and background layers (or equivalently an alpha matte), it is often the case that pixel measurements contain mixed colours which are a combination of foreground and background. Object boundaries, especially at thin sub-pixel structures like hair, pose a serious problem.In this paper we present a multiple view algorithm for computing the alpha matte. Using a Bayesian framework, we model each pixel as a combined sample from the foreground and background and compute a MAP estimate to factor the two. The novelties in this work include the incorporation of three different types of priors for enhancing the results in problematic scenes. The priors used are inequality constraints on colour and alpha values, spatial continuity, and the probability distribution of alpha values.The combination of these priors result in accurate and visually satisfying estimates. We demonstrate the method on real image sequences with varying degrees of geometric and photometric complexity. The output enables virtual objects to be added between the foreground and background layers, and we give examples of this augmentation to the original sequences.

86 citations


Book ChapterDOI
18 Jul 2002
TL;DR: It is demonstrated that wide baseline matching techniques can be successfully employed for this task by matching key frames between shots by representing each frame by a set of viewpoint invariant local feature vectors.
Abstract: We describe progress in matching shots which are images of the same 3D scene in a film. The problem is hard because the camera viewpoint may change substantially between shots, with consequent changes in the imaged appearance of the scene due to foreshortening, scale changes and partial occlusion.We demonstrate that wide baseline matching techniques can be successfully employed for this task by matching key frames between shots. The wide baseline method represents each frame by a set of viewpoint invariant local feature vectors. The local spatial support of the features means that segmentation of the frame (e.g. into foreground/background) is not required, and partial occlusion is tolerated.Results of matching shots for a number of different scene types are illustrated on a commercial film.

81 citations


Proceedings ArticleDOI
21 Jul 2002
TL;DR: Environment matting is a powerful technique for modelling the complex light-transport properties of real-world optically active elements: transparent, refractive and reflective objects.
Abstract: Environment matting is a powerful technique for modelling the complex light-transport properties of real-world optically active elements: transparent, refractive and reflective objects. Zongker et al [1999] and Chuang et al [2000] show how environment mattes can be computed for real objects under carefully controlled laboratory conditions. However, for many objects of interest, such calibration is difficult to arrange. For example, we might wish to determine the distortion caused by filming through an ancient window where the glass has flowed; we may have access only to archive footage; or we might simply want a more convenient means of acquiring the matte.

65 citations


Book ChapterDOI
28 May 2002
TL;DR: A new approach for recovering 3D geometry from an uncalibrated image sequence of a single axis (turntable) motion based on fitting a conic locus to corresponding image points over multiple views is described.
Abstract: In this paper, we describe a new approach for recovering 3D geometry from an uncalibrated image sequence of a single axis (turntable) motion. Unlike previous methods, the computation of multiple views encoded by the fundamental matrix or trifocal tensor is not required. Instead, the new approach is based on fitting a conic locus to corresponding image points over multiple views. It is then shown that the geometry of single axis motion can be recovered given at least two such conics. In the case of two conics the reconstruction may have a two fold ambiguity, but this ambiguity is removed if three conics are used.The approach enables the geometry of the single axis motion (the 3D rotation axis and Euclidean geometry in planes perpendicular to this axis) to be estimated using the minimal number of parameters. It is demonstrated that a Maximum Likelihood Estimation results in measurements that are as good as or superior to those obtained by previous methods, and with a far simpler algorithm. Examples are given on various real sequences, which show the accuracy and robustness of the new algorithm.

57 citations


01 Jan 2002
TL;DR: Algorithms for a systematic analysis of the twoand three-dimensional geometry of paintings are drawn from the work on “single-view reconstruction” and applied to interpreting works of art from the Italian Renaissance and later periods.
Abstract: This paper explores the use of computer graphics and computer vision techniques in the history of art. The focus is on analysing the geometry of perspective paintings to learn about the perspectival skills of artists and explore the evolution of linear perspective in history. Algorithms for a systematic analysis of the twoand three-dimensional geometry of paintings are drawn from the work on “single-view reconstruction” and applied to interpreting works of art from the Italian Renaissance and later periods. Since a perspectival painting is not a photograph of an actual subject but an artificial construction subject to imaginative manipulation and inadvertent inaccuracies, the internal consistency of its geometry must be assessed before carrying out any geometric analysis. Some simple techniques to analyse the consistency and perspectival accuracy of the geometry of a painting are discussed. Moreover, this work presents new algorithms for generating new views of a painted scene or portions of it, analysing shapes and proportions of objects, filling in occluded areas, performing a complete threedimensional reconstruction of a painting and a rigorous analysis of possible reconstruction ambiguities. The validity of the techniques described here is demonstrated on a number of historical paintings and frescoes. Whenever possible, the computer-generated results are compared to those obtained by art historians through careful manual analysis. This research represents a further attempt to build a constructive dialogue between two very different disciplines: computer science and history of art. Despite their fundamental differences, science and art can learn and be enriched by each other’s procedures. A longer and more detailed version of this paper may be found in [5].

30 citations



Proceedings Article
01 Dec 2002
TL;DR: Two representations of filter outputs, textons and binned histograms, are shown to be equivalent and two classification methodologies, nearest neighbour matching and Bayesian classification, are compared.
Abstract: The objective of this paper is classification of materials from a single image obtained under unknown viewpoint and illumination conditions. Texture classification under such general conditions is an extremely challenging task. Our methods are based on the statistical distribution of rotationally invariant filter responses in a low dimensional space. There are two points of novelty: first, two representations of filter outputs, textons and binned histograms, are shown to be equivalent; second, two classification methodologies, nearest neighbour matching and Bayesian classification, are compared. In essence, given the equivalence of texton and bin representations, the paper carries out an exact comparison between the texton based distribution comparison classifiers of Leung and Malik [IJCV 2001], Cula and Dana [CVPR 2001], and Varma and Zisserman [ECCV 2002], and the Bayesian classification scheme of Konishi and Yuille [CVPR

Journal ArticleDOI
01 Jan 2002
TL;DR: This paper investigates three novel minimal combinations of points and lines over three views, and gives complete solutions and reconstruction methods for two of these cases: "four points and three lines in three views", and "two points and six lines inThree views".
Abstract: In this paper we address the problem of projective reconstruction of structure and motion given only image data. In particular we investigate three novel minimal combinations of points and lines over three views, and give complete solutions and reconstruction methods for two of these cases: "four points and three lines in three views", and "two points and six lines in three views". We show that in general there are three and seven solutions respectively to these cases. The reconstruction methods are tested on real and simulated data. We also give tentative results for the case of nine lines in correspondence over three views, where experiments indicate that there may be up to 36 complex solutions

Proceedings ArticleDOI
26 Jul 2002
TL;DR: In this paper, an analysis of the way in which optical elements distort the appearance of their backgrounds allows the construction of environment mattes in situ without the need for specialized calibration, which is a powerful technique for modeling the complex light-transport properties of real-world optically active elements: transparent, refractive and reflective objects.
Abstract: Environment matting is a powerful technique for modeling the complex light-transport properties of real-world optically active elements: transparent, refractive and reflective objects. Recent research has shown how environment mattes can be computed for real objects under carefully controlled laboratory conditions. However, many objects for which environment mattes are necessary for accurate rendering cannot be placed into a calibrated lighting environment. We show in this paper that analysis of the way in which optical elements distort the appearance of their backgrounds allows the construction of environment mattes in situ without the need for specialized calibration.Specifically, given multiple images of the same element over the same background, where the element and background have relative motion, it is shown that both the background and the optical element's light-transport path can be computed.We demonstrate the technique on two different examples. In the first case, the optical element's geometry is simple, and evaluation of the realism of the output is easy. In the second, previous techniques would be difficult to apply. We show that image-based environment matting yields a realistic solution. We discuss how the stability of the solution depends on the number of images used, and how to regularize the solution where only a small number of images are available.

Proceedings ArticleDOI
24 Jun 2002
TL;DR: This paper reviews two research themes - wide baseline matching - establishing correspondences and cameras for images acquired from very different viewpoints and automated surface reconstruction of a 3D scene from close-range images.
Abstract: This paper reviews two research themes. The first is wide baseline matching - establishing correspondences and cameras for images acquired from very different viewpoints. The second is automated surface reconstruction of a 3D scene from close-range images. These two themes are linked in an example application of automated architectural reconstruction from images. We demonstrate reconstructions of several university buildings from multiple photographs.

01 Jan 2002
TL;DR: This paper investigates three novel minimal combinations of points and lines over three views, and gives complete solutions and reconstruction methods for two of these cases: “four points and three lines in three views”, and “two points and six lines in Three Views”.
Abstract: In this paper we address the problem of projective reconstruction of structure and motion given only image data. In particular we investigate three novel minimal combinations of points and lines over three views, and give complete solutions and reconstruction methods for two of these cases: “four points and three lines in three views”, and “two points and six lines in three views”. We show that in general there are three and seven solutions respectively to these cases. The reconstruction methods are tested on real and simulated data.