scispace - formally typeset
Search or ask a question

Showing papers presented at "British Machine Vision Conference in 1999"


Proceedings ArticleDOI
01 Jan 1999
TL;DR: This work introduces a multi-view nonlinear shape model utilising 2D view-dependent constraint without explicit reference to 3D structures, and adopts Kernel PCA based on Support Vector Machines.
Abstract: Recovering the shape of any 3D object using multiple 2D views requires establishing correspondence between feature points at different views. However changes in viewpoint introduce self-occlusions, resulting nonlinear variations in the shape and inconsistent 2D features between views. Here we introduce a multi-view nonlinear shape model utilising 2D view-dependent constraint without explicit reference to 3D structures. For nonlinear model transformation, we adopt Kernel PCA based on Support Vector Machines.

254 citations


Proceedings ArticleDOI
16 Sep 1999
TL;DR: A simple, geometrically intuitive method which exploits the strong rigidity constraints of paral-lelism and orthogonality present in indoor and outdoor architectural scenes to recover the projection matrices for each viewpoint is proposed.
Abstract: We address the problem of recovering 3D models from uncalibrated images of architectural scenes. We propose a simple, geometrically intuitive method which exploits the strong rigidity constraints of paral-lelism and orthogonality present in indoor and outdoor architectural scenes. We present a n o vel algorithm that uses these simple constraints to recover the projection matrices for each viewpoint and relate our method to the algorithm of Caprile and Torre 2]. The projection matrices are used to recover partial 3D models of the scene and these can be used to visualise new viewpoints. Our approach d o e s not need any a priori information about the cameras being used. A w orking system called PhotoBuilder has been designed and implemented to allow a user to interactively build a VRML model of a building from uncalibrated images from arbitrary viewpoints 3, 4 ].

249 citations


Proceedings ArticleDOI
01 Jan 1999
TL;DR: This paper finds that the ASM is faster and achieves more accurate feature point location than the AAM, but the A AM gives a better match to the texture.
Abstract: Statistical models of the shape and appearance of image structures can be matched to new images using both the Active Shape Model [7] algorithm and the Active Appearance Model algorithm [2]. The former searches along profiles about the current model point positions to update the current estimate of the shape of the object. The latter samples the image data under the current instance and uses the difference between model and sample to update the appearance model parameters. In this paper we compare and contrast the two algorithms, giving the results of experiments testing their performance on two data sets, one of faces, the other of structures in MR brain sections. We find that the ASM is faster and achieves more accurate feature point location than the AAM, but the AAM gives a better match to the texture.

188 citations


Proceedings ArticleDOI
13 Sep 1999
TL;DR: This work presents an approach for 3D reconstruction of objects from a single image based on user-provided coplanarity, perpendicularity and parallelism constraints, used to calibrate the image and perform3D reconstruction.
Abstract: We present an approach for 3D reconstruction of objects from a single image. Obviously, constraints on the 3D structure are needed to perform this task. Our approach is based on user-provided coplanarity, perpendicularity and parallelism constraints. These are used to calibrate the image and perform 3D reconstruction. The method is described in detail and results are provided.

147 citations


Proceedings ArticleDOI
13 Sep 1999
TL;DR: This work considers the self-calibration problem for a moving camera whose intrinsic parameters are known, except the focal length, which may vary freely across different views, and gives a complete catalogue of the so-called critical motion sequences, which are used to derive thecritical motion sequences for stereo systems with variable focal lengths.
Abstract: We consider the self-calibration problem for a moving camera whose intrinsic parameters are known, except the focal length, which may vary freely across different views. The conditions under which the determination of the focal length's values for an image sequence is not possible, are derived. These depend only on the camera's motions. We give a complete catalogue of the so-called critical motion sequences. This is then used to derive the critical motion sequences for stereo systems with variable focal lengths.

82 citations


Proceedings ArticleDOI
01 Jan 1999
TL;DR: A number of similarity measures are proposed, based on indices derived from the DT, the DT itself and the DT deviatoric, used to drive an elastic matching algorithm applied to the task of registration of 3D images of the human brain.
Abstract: In this paper, we discuss matching of diffusion tensor (DT) MRIs of the human brain. Issues concerned with matching and transforming these complex images are discussed. A number of similarity measures are proposed, based on indices derived from the DT, the DT itself and the DT deviatoric. Each measure is used to drive an elastic matching algorithm applied to the task of registration of 3D images of the human brain. The performance of the various similarity measures is compared empirically by use of several quality of match measures computed over a pair of matched images. Results indicate that the best matches are obtained from a Euclidean difference measure using the full DT.

69 citations


Proceedings ArticleDOI
01 Jan 1999
TL;DR: A formulation of convexity as a criterion of good part decomposition is defined and validated by applying it to some simple shapes as well as against showing its close correspondence with Hoffman and Singh’s part saliency factors.
Abstract: The partitioning of two dimensional (2-D) shapes into subparts is an important component of shape analysis. The paper defines a formulation of convexity as a criterion of good part decomposition. Its appropriateness is validated by applying it to some simple shapes as well as against showing its close correspondence with Hoffman and Singh's (1997) part saliency factors.

63 citations


Proceedings ArticleDOI
01 Jan 1999
TL;DR: A technique for extracting structural features from cursive Arabic script that is used to train the Hidden Markov Model to perform the recognition of binary word image features.
Abstract: We present a technique for extracting structural features from cursive Arabic script. After preprocessing, the skeleton of the binary word image is decomposed into a number of segments in a certain order. Each segment is transformed into a feature vector. The target features are the curvature of the segment, its length relative to other segment lengths of the same word, the position of the segment relative to the centroid of the skeleton, and detailed description of curved segments. The result of this method is used to train the Hidden Markov Model to perform the recognition.

55 citations


Proceedings ArticleDOI
01 Jan 1999
TL;DR: A comparison of these three algorithms for registration of multiple partially overlapping point sets with respect to cpu time, ease of implementation, accuracy and stability is performed.
Abstract: Recently 3 algorithms for registration of multiple partially overlapping point sets have been published by Pennec [11], Stoddart & Hilton [12] and Benjemma & Schmitt [1]. The problem is of particular interest in the building of surface models from multiple range images taken from several viewpoints. In this paper we perform a comparison of these three algorithms with respect to cpu time, ease of implementation, accuracy and stability.

49 citations


Proceedings ArticleDOI
01 Jan 1999
TL;DR: A novel force field transformation has been developed in which the image is treated as an array of Gaussian attractors that act as the source of a force field, and shows promising results in automatic ear recognition.
Abstract: The overall objective in defining feature space is to reduce the dimensionality of pattern space yet maintaining discriminatory power for classification and invariant description. To meet this objective, in the context of ear biometrics, a novel force field transformation has been developed in which the image is treated as an array of Gaussian attractors that act as the source of a force field. The directional properties of the force field are exploited to automatically locate the extrema of a small number of potential energy wells and associated potential channels. These form the basis of the ear description. This has been applied to a small database of ears and initial results show that the new approach has suitable performance attributes and shows promising results in automatic ear recognition.

44 citations


Proceedings ArticleDOI
01 Jan 1999
TL;DR: An algorithm for determining the next best position of a range sensor in 3D space for incrementally recovering an indoor scene is presented, based on a mixed exhaustive search and hill climbing optimisation, and outputs the next position in reasonable time.
Abstract: We present an algorithm for determining the next best position of a range sensor in 3D space for incrementally recovering an indoor scene. The method works in ve dimensions: the sensor navigates inside the scene, and can be placed at any 3D position and oriented by a pantilt head. The method is based on a mixed exhaustive search and hill climbing optimisation, and outputs the next position in reasonable time. Results are shown on a simulated mobile robot with a simulated range sensor navigating in a CAD model of a scene.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: The results show that orientation-selective Gabor lters enhance di erences in pose, and that di erent lter orientations are optimal atdi erent poses, and in contrast, PCA was found to provide an identity-invariant representation in which similarities can be calculated more robustly.
Abstract: Identity-independent estimation of head pose from prototype images is a perplexing task, requiring pose-invariant face detection. The problem is exacerbated by changes in illumination, identity and facial position. Facial images must be transformed in such a way as to emphasise di erences in pose, while suppressing di erences in identity. We investigate appropriate transformations for use with a similarityto-prototypes philosophy. The results show that orientation-selective Gabor lters enhance di erences in pose, and that di erent lter orientations are optimal at di erent poses. In contrast, PCA was found to provide an identity-invariant representation in which similarities can be calculated more robustly. We also investigate the angular resolution at which pose changes can be resolved using our methods. An angular resolution of 10 was found to be su ciently discriminable at some poses but not at others, while 20 is quite acceptable at most poses.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: Evidence that by combining fragments from different objects the method can deal successfully with intra-object variability within a class is presented, which is more economical and more resistant to occlusion and deformations than methods relying on global object views.
Abstract: We describe an approach to object classification based on the conjunction of multiple classspecific object fragments detected in the image. The method represents members of a given class (such as a face, or a car) using combinations of common sub-structures, termed fragments. These fragments are partial 2-D patterns extracted from examples views of objects belonging to the class in question. An object view is covered by multiple, overlapping fragments of several types, and at multiple levels of complexity. We describe the detection of the individual fragments, and the combination of the fragments to detect complete objects. The combination of fragments to form a consistent overall arrangement is based in this scheme on a number of simple mechanisms: the use of overlapping fragments, a spatial `voting' scheme, and by imposing some constraints on the tolerated location of the fragments within the overall object view. We present experimental results of the application of the method to the detection of face and car views in cluttered scenes and to partially occluded objects. We present evidence that by combining fragments from different objects the method can deal successfully with intra-object variability within a class. The method is more economical and more resistant to occlusion and deformations than methods relying on global object views. Introduction In this paper we study the challenging task of detecting different objects from a given class (such as a face or a car) in an image. In addition to the unknown location and illumination of the object, the method must deal with intra-class variability between objects from the same class. The detection process must therefore cover a range of possible shapes, missing parts and additional clutter. To deal with the problem of shape variability and detect novel shapes of a given class, Turk & Pentland (1991) used the principal components of registered face views. Views of novel objects can be approximated by the superposition of several basis functions, or 'eigenfaces'. Poggio & Sung (1994) used a distribution-based modeling scheme for detecting faces in cluttered scenes. They represented a face view as a gray level vector with 283 components, and trained a multi-layered preceptron network to classify such data as face/non-face vectors. The generalization to novel shapes within the class is obtained in these schemes by the inherent generalization capacity of the neural network mechanism. Rowley, Baluja & Kanade (1995) and Lin, Kung & Lin (1996) BMVC99

Proceedings ArticleDOI
01 Jan 1999
TL;DR: An edge classifier is derived which distinguishes hyper-spectral edges into the following types: a shadow or geometry edge, a highlight edge, and a material edge.
Abstract: Intensity-based edge detectors cannot distinguish whether an edge is caused by material changes, shadows, surface orientation changes or by highlights. Therefore, our aim is to classify the physical cause of an edge using hyperspectra obtained by a spectrograph. Methods are presented to detect edges in hyperspectral images. In theory, the effect of varying imaging conditions is analyzed for ”raw” hyper-spectra, for normalized hyper-spectra, and for hue computed from hyper-spectra. From this analysis, an edge classifier is derived which distinguishes hyper-spectral edges into the following types: (1) a shadow or geometry edge, (2) a highlight edge, (3) a material edge.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: A novel approach is presented for automatically acquiring stochastic models of the high-level structure of an activity without the assumption of any prior knowledge for the generation of realistic sample behaviours and the performance of models for long-term temporal prediction.
Abstract: In recent years there has been an increased interest in the modelling and recognition of human activities involving highly structured and semantically rich behaviour such as dance, aerobics, and sign language. A novel approach is presented for automatically acquiring stochastic models of the high-level structure of an activity without the assumption of any prior knowledge. The process involves temporal segmentation into plausible atomic behaviour components and the use of variable length Markov models for the efficient representation of behaviours. Experimental results are presented which demonstrate the generation of realistic sample behaviours and evaluate the performance of models for long-term temporal prediction.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: In this article, a 3D model-based tracking system is presented, which combines modern graphical rendering technology with constrained active contour tracking techniques to create wire-frame-snakes.
Abstract: This paper presents a novel three-dimensional model-based tracking system which has been incorporated into a visual servoing system. The tracking system combines modern graphical rendering technology with constrained active contour tracking techniques to create wire-frame-snakes. It operates in real-time at video frame rate (25 Hz) and is based on an internal CAD model of the object to be tracked. This model is rendered using a binary space partition tree to perform hidden line removal and the visible features are identified on-line at each frame and are tracked in the video feed. The tracking system has been extended to incorporate real-time on-line calibration and tracking of internal camera parameters. Results from on-line calibration and visual servoing experiments are presented.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: A method that measures the image ambiguity at each view in a hybrid 2D-3D dynamic human model with which robust matching and tracking of a 3D skeleton model of a human body among multiple views can be performed.
Abstract: We propose a novel framework for a hybrid 2D-3D dynamic human model with which robust matching and tracking of a 3D skeleton model of a human body among multiple views can be performed. We describe a method that measures the image ambiguity at each view. The 3D skeleton model and the correspondence between the model and its 2D images are learnt using hierarchical principal component analysis. Tracking in individual views is performed based on CONDENSATION.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: In this paper, the authors examined the accuracy of the sub-pixel location of edges in the Canny detector using a discrete step edge model and Monte-Carlo simulation and presented data which permits prediction of the likely error in the subpixel estimate as a function of grey level step height in the presence of noise.
Abstract: We examine the accuracy of the sub-pixel location of edges in the Canny detector using a discrete step edge model and Monte-Carlo simulation. Comparison is made with previously published analysis in the continuous domain. In this paper we identify potential systematic errors due to a combination of the width of the smoothing kernel and the quadratic interpolation scheme which we show can be reduced to less than one-fiftieth of a pixel with lookup table. We present data which permits prediction of the likely error in the sub-pixel estimate as a function of grey level step height in the presence of noise.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: In this paper, an extensive study of the SVM sensitivity to various processing steps in the context of face authentication is presented, where the authors evaluate the impact of the representation space and photometric normalization technique on SVM performance.
Abstract: We present an extensive study of the support vector machine (SVM) sensitivity to various processing steps in the context of face authentication. In particular, we evaluate the impact of the representation space and photometric normalisation technique on the SVM performance. Our study supports the hypothesis that the SVM approach is able to extract the relevant discriminatory information from the training data. We believe that this is the main reason for its superior performance over benchmark methods (e.g. the eigenface technique). However, when the representation space already captures and emphasises the discriminatory information content (e.g. the fisherface method), the SVMs cease to be superior to the benchmark techniques. The SVM performance evaluation is carried out on a large face database containing 295 subjects.

Proceedings ArticleDOI
01 Sep 1999
TL;DR: This paper reviews a number of recently developed stereo matching algorithms and representations that are especially well suited for image-based rendering applications such as novel view generation and the mixing of live imagery with synthetic computer graphics.
Abstract: This paper reviews a number of recently developed stereo matching algorithms and representations It focuses on techniques that are especially well suited for image-based renderingapplications such as novel view generation and the mixing of live imagery with synthetic computer graphics The paper begins by reviewing some recent approaches to the classic problem of recovering a depth map from two or more images It then describes a number of newer representations (and their associated reconstruction algorithms), including volumetric representations, layered plane-plus-parallax representations, and multiple depth maps Each of these techniques has its own strengths and weaknesses, which are discussed

Proceedings ArticleDOI
01 Jan 1999
TL;DR: This work extends SVMs to model the appearance of human faces which undergo nonlinear change across multiple views and uses inherent factors in the nature of the input images and the SVM classification algorithm to perform both multi-view face detection and pose estimation.
Abstract: Support Vector Machines have shown great potential for learning classification functions that can be applied to object recognition. In this work, we extend SVMs to model the appearance of human faces which undergo nonlinear change across multiple views. The approach uses inherent factors in the nature of the input images and the SVM classification algorithm to perform both multi-view face detection and pose estimation.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: Two adaptive methods for describing such a background are proposed which are based on Principal and Independent Component Analysis of sampled image patches which enable the ATR system to adapt to certain backgrounds and identify non-standard elements in the images as targets.
Abstract: Automatic Target Recognition (ATR) is a demanding application that requires separation of targets from a noisy background in a sequence of images. In this paper, two adaptive methods for describing such a background are proposed which are based on Principal and Independent Component Analysis of sampled image patches. Coupled together with feature selection and outlier detection techniques they enable the ATR system to adapt to certain backgrounds and identify non-standard elements in the images as targets. The methods proposed are compared with a standard wavelet-based approach and are shown to perform somewhat better on a difficult image sequence.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: This paper provides two algorithms; one for adding eigenspaces, another for subtracting them, thus allowing for incremental updating and downdating of data models, and illustrates the use of the algorithms in three generic applications, including the dynamic construction of Gaussian mixture models.
Abstract: This paper provides two algorithms; one for adding eigenspaces, another for subtracting them, thus allowing for incremental updating and downdating of data models. Importantly, and unlike previous work, we keep an accurate track of the mean of the data, which allows our methods to be used in classification applications. The result of adding eigenspaces, each made from a set of data, is an approximation to that which would obtain were the sets of data taken together. Subtracting eigenspaces yields a result approximating that which would obtain were a subset of data used. Using our algorithms it is possible to perform “arithmetic” on eigenspaces without reference to the original data. We illustrate the use of our algorithms in three generic applications, including the dynamic construction of Gaussian mixture models.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: This paper introduces a framework to recognise behaviours based on both learning prior and continuous propagation of density models of behaviour patterns, which is learned from training sequences using hidden Markov models and density models are augmented by current visual observation.
Abstract: Recognition of human behaviours requires modeling the underlying spatial and temporal structures of their motion patterns. Such structures are intrinsically probabilistic and therefore should be modelled as stochastic processes. In this paper we introduce a framework to recognise behaviours based on both learning prior and continuous propagation of density models of behaviour patterns. Prior is learned from training sequences using hidden Markov models and density models are augmented by current visual observation.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: A robust algorithm which can track both unmarked road boundaries and marked road boundaries is described which makes a decision on the position of the road boundary based upon the maximum of the accumulated values of the Poppet algorithm.
Abstract: There have been many attempts at following road edge boundaries based upon edge information. Most of this work concentrates on the following of road markings. This paper describes a robust algorithm which can track both unmarked road boundaries and marked road boundaries. The Poppet (Position of Pivot Point Estimating Trajectory) algorithm accumulates edge points in the image and makes a decision on the position of the road boundary based upon the maximum of the accumulated values. The merits of using an edge map which is not thresholded are discussed. Finally, the implementation of Poppet using parallel processors is discussed.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: This paper casts the problem of point-set alignment via Procrustes analysis into a maximum likelihood framework using the EM algorithm and shows how alignment can be realised by applying singular value decomposition to a weighted point correlation matrix.
Abstract: This paper casts the problem of point-set alignment via Procrustes analysis into a maximum likelihood framework using the EM algorithm. The aim is to improve the robustness of the Procrustes alignment to noise and clutter. By constructing a Gaussian mixture model over the missing correspondences between individual points, we show how alignment can be realised by applying singular value decomposition to a weighted point correlation matrix. Moreover, by gauging the relational consistency of the assigned correspondence matches, we can edit the point sets to remove clutter. We illustrate the effectiveness of the method on matching stereograms. We also provide a sensitivity analysis to demonstrate the operational advantages of the method.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: By incorporating image-based geometric constraints over multiple views, this paper improves on traditional techniques which use purely 3D information and directly target perceptual cues, important to the human visual system, by which errors in AR are most readily perceived.
Abstract: The goal of augmented reality is to insert virtual objects into real video sequences. This paper shows that by incorporating image-based geometric constraints over multiple views, we improve on traditional techniques which use purely 3D information. The constraints imposed are chosen to directly target perceptual cues, important to the human visual system, by which errors in AR are most readily perceived. Imposition of the constraints is achieved by constrained maximum-likelihood estimation, and blends projective, affine and Euclidean geometry as appropriate in different cases. We introduce a number of examples of augmented reality tasks, show how image-based constraints can be incorporated into current 3D-based systems, and demonstrate the improvements conferred.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: A new colour space for content based image retrieval is presented, which provides both the ability to measure similarity and determine dissimilarity, using fuzzy logic and psychologically based set theoretic similarity measurement.
Abstract: In this paper a new colour space for content based image retrieval is presented, which is based upon psychophysical research into human perception. It provides both the ability to measure similarity and determine dissimilarity, using fuzzy logic and psychologically based set theoretic similarity measurement. These properties are shown to be equal or superior to conventional colour spaces. Example applications are also demonstrated.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: It is shown how the resulting ctree can be used to remove noise from images, provide a hierarchical way to estimate a dense disparity map from a stereo pair and to provide a basic segmentation of images for image retrieval purposes.
Abstract: Images are re-mapped as scale-space trees. The minimal data structure is then augmented by “complement nodes” to increase the practical value of the representation. It is then shown how the resulting ctree can be used to remove noise from images, provide a hierarchical way to estimate a dense disparity map from a stereo pair and to provide a basic segmentation of images for image retrieval purposes.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: The main result is that there is little reason for preferring the fundamental matrix model over the collineation model, even when the former is the ‘true’ model.
Abstract: Scene geometry can be inferred from point correspondences between two images. The inference process includes the selection of a model. Four models are considered: background (or null), collineation, affine fundamental matrix and fundamental matrix. It is shown how Minimum Description Length (MDL) can be used to compare the different models. The main result is that there is little reason for preferring the fundamental matrix model over the collineation model, even when the former the ‘true’ model.