scispace - formally typeset
Search or ask a question

Showing papers in "International Journal of Computer Vision in 1996"


Journal ArticleDOI
TL;DR: The Photobook system is described, which is a set of interactive tools for browsing and searching images and image sequences that make direct use of the image content rather than relying on text annotations to provide a sophisticated browsing and search capability.
Abstract: We describe the Photobook system, which is a set of interactive tools for browsing and searching images and image sequences. These query tools differ from those used in standard image databases in that they make direct use of the image content rather than relying on text annotations. Direct search on image content is made possible by use of semantics-preserving image compression, which reduces images to a small set of perceptually-significant coefficients. We discuss three types of Photobook descriptions in detail: one that allows search based on appearance, one that uses 2-D shape, and a third that allows search based on textural properties. These image content descriptions can be combined with each other and with text-based descriptions to provide a sophisticated browsing and search capability. In this paper we demonstrate Photobook on databases containing images of people, video keyframes, hand tools, fish, texture swatches, and 3-D medical data.

1,748 citations


Journal ArticleDOI
TL;DR: It is shown how prior assumptions about the spatial structure of outliers can be expressed as constraints on the recovered analog outlier processes and how traditional continuation methods can be extended to the explicit outlier-process formulation.
Abstract: The modeling of spatial discontinuities for problems such as surface recovery, segmentation, image reconstruction, and optical flow has been intensely studied in computer vision. While “line-process” models of discontinuities have received a great deal of attention, there has been recent interest in the use of robust statistical techniques to account for discontinuities. This paper unifies the two approaches. To achieve this we generalize the notion of a “line process” to that of an analog “outlier process” and show how a problem formulated in terms of outlier processes can be viewed in terms of robust statistics. We also characterize a class of robust statistical problems for which an equivalent outlier-process formulation exists and give a straightforward method for converting a robust estimation problem into an outlier-process formulation. We show how prior assumptions about the spatial structure of outliers can be expressed as constraints on the recovered analog outlier processes and how traditional continuation methods can be extended to the explicit outlier-process formulation. These results indicate that the outlier-process approach provides a general framework which subsumes the traditional line-process approaches as well as a wide class of robust estimation problems. Examples in surface reconstruction, image segmentation, and optical flow are presented to illustrate the use of outlier processes and to show how the relationship between outlier processes and robust statistics can be exploited. An appendix provides a catalog of common robust error norms and their equivalent outlier-process formulations.

752 citations


Journal ArticleDOI
TL;DR: This paper clarifies the projective nature of the correspondence problem in stereo and shows that the epipolar geometry can be summarized in one 3×3 matrix of rank 2 which is proposed to call the Fundamental matrix, a task which is of practical importance.
Abstract: In this paper we analyze in some detail the geometry of a pair of cameras, i.e., a stereo rig. Contrarily to what has been done in the past and is still done currently, for example in stereo or motion analysis, we do not assume that the intrinsic parameters of the cameras are known (coordinates of the principal points, pixels aspect ratio and focal lengths). This is important for two reasons. First, it is more realistic in applications where these parameters may vary according to the task (active vision). Second, the general case considered here, captures all the relevant information that is necessary for establishing correspondences between two pairs of images. This information is fundamentally projective and is hidden in a confusing manner in the commonly used formalism of the Essential matrix introduced by Longuet-Higgins (1981). This paper clarifies the projective nature of the correspondence problem in stereo and shows that the epipolar geometry can be summarized in one 3×3 matrix of rank 2 which we propose to call the Fundamental matrix. After this theoretical analysis, we embark on the task of estimating the Fundamental matrix from point correspondences, a task which is of practical importance. We analyze theoretically, and compare experimentally using synthetic and real data, several methods of estimation. The problem of the stability of the estimation is studied from two complementary viewpoints. First we show that there is an interesting relationship between the Fundamental matrix and three-dimensional planes which induce homographies between the images and create unstabilities in the estimation procedures. Second, we point to a deep relation between the unstability of the estimation procedure and the presence in the scene of so-called critical surfaces which have been studied in the context of motion analysis. Finally we conclude by stressing the fact that we believe that the Fundamental matrix will play a crucial role in future applications of three-dimensional Computer Vision by greatly increasing its versatility, robustness and hence applicability to real difficult problems.

707 citations


Journal ArticleDOI
TL;DR: A new framework to perform nonrigid surface registration based on various extensions of an iterative algorithm to rigidly register surfaces represented by a set of 3D points, when a prior estimate of the displacement is available.
Abstract: In this paper, we propose a new framework to perform nonrigid surface registration. It is based on various extensions of an iterative algorithm recently presented by several researchers (Besl and McKay, 1992; Champleboux et al., 1992; Chen and Medioni, 1992; Menq and Lai, 1992; Zhang, 1994) to rigidly register surfaces represented by a set of 3D points, when a prior estimate of the displacement is available. Our framework consists of three stages: •First, we search for the best rigid displacement to superpose the two surfaces. We show how to efficiently use curvatures to superpose principal frames at possible corresponding points in order to find a prior rough estimate of the displacement and initialize the iterative algorithm. •Second, we search for the best affine transformation. We introduce differential information in points coordinates: this allows us to match locally similar points. Then, we show how principal frames and curvatures are transformed by an affine transformation. Finally, we introduce this differential information in a global criterion minimized by extended Kalman filtering in order to ensure the convergence of the algorithm. •Third, we locally deform the surface. Instead of computing a global affine transformation, we attach to each point a local affine transformation varying smoothly along the surface. We call this deformation a locally affine deformation. All these stages are illustrated with experiments on various real biomedical surfaces (teeth, faces, skulls, brains and hearts), which demonstrate the validity of the approach.

386 citations


Journal ArticleDOI
TL;DR: This work addresses the problem of contour inference from partial data, as obtained from state-of-the-art edge detectors, and argues that in order to obtain more pereeptually salient contours, it is necessary to impose generic constraints such as continuity and co-curvilinearity.
Abstract: We address the problem of contour inference from partial data, as obtained from state-of-the-art edge detectors. We argue that in order to obtain more pereeptually salient contours, it is necessary to impose generic constraints such as continuity and co-curvilinearity. The implementation is in the form of a convolution with a mask which encodes both the orientation and the strength of the possible continuations. We first show how the mask, called the “Extension field” is derived, then how the contributions from different sites are collected to produce a saliency map. We show that the scheme can handle a variety of input data, from dot patterns to oriented edgels in a unified manner, and demonstrate results on a variety of input stimuli. We also present a similar approach to the problem of inferring contours formed by end points. In both cases, the scheme is non-linear, non iterative, and unified in the sense that all types of input tokens are handled in the same manner.

316 citations


Journal ArticleDOI
TL;DR: A model for generating the shapes of animate objects which gives a formalism for solving the inverse problem of object recognition and how such a representation scheme can be automatically learnt from examples is described.
Abstract: We describe a flexible object recognition and modelling system (FORMS) which represents and recognizes animate objects from their silhouettes. This consists of a model for generating the shapes of animate objects which gives a formalism for solving the inverse problem of object recognition. We model all objects at three levels of complexity: (i) the primitives, (ii) the mid-grained shapes, which are deformations of the primitives, and (iii) objects constructed by using a grammar to join mid-grained shapes together. The deformations of the primitives can be characterized by principal component analysis or modal analysis. When doing recognition the representations of these objects are obtained in a bottom-up manner from their silhouettes by a novel method for skeleton extraction and part segmentation based on deformable circles. These representations are then matched to a database of prototypical objects to obtain a set of candidate interpretations. These interpretations are verified in a top-down process. The system is demonstrated to be stable in the presence of noise, the absence of parts, the presence of additional parts, and considerable variations in articulation and viewpoint. Finally, we describe how such a representation scheme can be automatically learnt from examples.

316 citations


Journal ArticleDOI
TL;DR: A computational model for binocular stereopsis is developed, attempting to explain the process by which the information detailing the 3-D geometry of object surfaces is encoded in a pair of stereo images.
Abstract: We develop a computational model for binocular stereopsis, attempting to explain the process by which the information detailing the 3-D geometry of object surfaces is encoded in a pair of stereo images. We design our model within a Bayesian framework, making explicit all of our assumptions about the nature of image coding and the structure of the world. We start by deriving our model for image formation, introducing a definition of half-occluded regions and deriving simple equations relating these regions to the disparity function. We show that the disparity function alone contains enough information to determine the half-occluded regions. We use these relations to derive a model for image formation in which the half-occluded regions are explicitly represented and computed. Next, we present our prior model in a series of three stages, or “worlds,” where each world considers an additional complication to the prior. We eventually argue that the prior model must be constructed from all of the local quantities in the scene geometry-i.e., depth, surface orientation, object boundaries, and surface creases. In addition, we present a new dynamic programming strategy for estimating these quantities. Throughout the article, we provide motivation for the development of our model by psychophysical examinations of the human visual system.

289 citations


Journal ArticleDOI
TL;DR: A structured synopsis of the problems in image motion computation and analysis, and of the methods proposed, exposing the underlying models and supporting assumptions are offered.
Abstract: The goal of this paper is to offer a structured synopsis of the problems in image motion computation and analysis, and of the methods proposed, exposing the underlying models and supporting assumptions. A sufficient number of pointers to the literature will be given, concentrating mostly on recent contributions. Emphasis will be on the detection, measurement and segmentation of image motion. Tracking, and deformable motion issues will be also addressed. Finally, a number of related questions which could require more investigations will be presented.

275 citations


Journal ArticleDOI
TL;DR: A new type of feature points of 3D surfaces, based on geometric invariants, are introduced, and it is shown that the relative positions of those 3D points are invariant according to 3D rigid transforms (rotation and translation).
Abstract: We introduce in this paper a new type of feature points of 3D surfaces, based on geometric invariants. We call this new type of feature points the extremal points of the 3D surfaces, and we show that the relative positions of those 3D points are invariant according to 3D rigid transforms (rotation and translation). We show also how to extract those points from 3D images, such as Magnetic Resonance images (MRI) or Cat-Scan images, and also how to use them to perform precise 3D registration. Previously, we described a method, called the Marching Lines algorithm, which allow us to extract the extremal lines, which are geometric invariant 3D curves, as the intersection of two implicit surfaces: the extremal points are the intersection of the extremal lines with a third implicit surface. We present an application of the extremal points extraction to the fully automatic registration of two 3D images of the same patient, taken in two different positions, to show the accuracy and robustness of the extracted extremal points.

218 citations


Journal ArticleDOI
TL;DR: A heuristic search is proposed to single out the optimal transformation in low order time for realistic object recognition or registration, where the two range images are often extracted from different view points of the model.
Abstract: A new technique to recognise 3D free-form objects via registration is proposed. This technique attempts to register a free-form surface, represented by a set of % MathType!MTEF!2!1!+-% feaafeart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaGOmamaala% aabaGaaGymaaqaaiaaikdaaaGaamiraaaa!38F8!\[2\frac{1}{2}D\] sensed data points, to the model surface, represented by another set of % MathType!MTEF!2!1!+-% feaafeart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaGOmamaala% aabaGaaGymaaqaaiaaikdaaaGaamiraaaa!38F8!\[2\frac{1}{2}D\] model data points, without prior knowledge of correspondence or view points between the two point sets. With an initial assumption that the sensed surface be part of a more complete model surface, the algorithm begins by selecting three dispersed, reliable points on the sensed surface. To find the three corresponding model points, the method uses the principal curvatures and the Darboux frames to restrict the search over the model space. Invariably, many possible model 3-typles will be found. For each hypothesized model 3-tuple, the transformation to match the sensed 3-tuple to the model 3-tuple can be determined. A heuristic search is proposed to single out the optimal transformation in low order time. For realistic object recognition or registration, where the two range images are often extracted from different view points of the model, the earlier assumption that the sensed surface be part of a more complete model surface cannot be relied on. With this, the sensed 3-tuple must be chosen such that the three sensed points lie on the common region visible to both the sensed and model views. We propose an algorithm to select a minimal non-redundant set of 3-tuples such that at least one of the 3-tuples will lie on the overlap. Applying the previous algorithm to each 3-tuple within this set, the optimal transformation can be determined. Experiments using data obtained from a range finder have indicated fast registration for relatively complex test cases. If the optimal registrations between the sensed data (candidate) and each of a set of model data are found, then, for 3D object recognition purposes, the minimal best fit error can be used as the decision rule.

205 citations


Journal Article
TL;DR: In this paper, the structural differences between vancomycin, ristocetin A, and teicoplanin were investigated as chiral selectors in capillary electrophoresis (CE).
Abstract: The structurally related glycopeptide antibiotics vancomycin, ristocetin A, and teicoplanin can all be used as chiral selectors in capillary electrophoresis (CE). Both experimental and modeling studies were done to elucidate their similarities and differences. There are identifiable morphological differences in the aglycon macrocyclic portions of these three compounds. In addition, there are other structural distinctions that can affect their CE enantioselectivity, migration times, and efficiency. Teicoplanin is the most distinct of the three and is the only one that is surface active. Its aggregational properties appear to affect its enantioselectivity among other things. The similar but not identical structures of the three glycopeptides produce similar but not identical enantioselectivities. This leads to the empirically useful “principle of complementary separations”, in which a partial resolution with one chiral selector can be brought to baseline with one of the others. Overall, ristocetin A appears...

Journal ArticleDOI
TL;DR: An algorithm is developed that estimates a reflectance ratio for each region in an image with respect to its background that is efficient as it computes ratios for all image regions in just two raster scans.
Abstract: Neighboring points on a smoothly curved surface have similar surface normals and illumination conditions. Therefore, their brightness values can be used to compute the ratio of their reflectance coefficients. Based on this observation, we develop an algorithm that estimates a reflectance ratio for each region in an image with respect to its background. The algorithm is efficient as it computes ratios for all image regions in just two raster scans. The region reflectance ratio represents a physical property that is invariant to illumination and imaging parameters. Several experiments are conducted to demonstrate the accuracy and robustness of ratio invariant. The ratio invariant is used to recognize objects from a single brightness image of a scene. Object models are automatically acquired and represented using a hash table. Recognition and pose estimation algorithms are presented that use ratio estimates of scene regions as well as their geometric properties to index the hash table. The result is a hypothesis for the existence of an object in the image. This hypothesis is verified using the ratios and locations of other regions in the scene. This approach to recognition is effective for objects with printed characters and pictures. Recognition experiments are conducted on images with illumination variations, occlusions, and shadows. The paper is concluded with a discussion on the simultaneous use of reflectance and geometry for visual perception.

Journal ArticleDOI
TL;DR: The color reflection model is based on the dichromatic model for dielectric materials and on a color space, called S space, formed with three orthogonal basis functions, which exhibits a more direct correspondence to body colors and to diffuse and specular interface reflections, shading, shadows and inter-reflections than the RGB coordinates.
Abstract: We present a computational model and algorithm for detecting diffuse and specular interface reflections and some inter-reflections. Our color reflection model is based on the dichromatic model for dielectric materials and on a color space, called S space, formed with three orthogonal basis functions. We transform color pixels measured in RGB into the S space and analyze color variations on objects in terms of brightness, hue and saturation which are defined in the S space. When transforming the original RGB data into the S space, we discount the scene illumination color that is estimated using a white reference plate as an active probe. As a result, the color image appears as if the scene illumination is white. Under the whitened illumination, the interface reflection clusters in the S space are all aligned with the brightness direction. The brightness, hue and saturation values exhibit a more direct correspondence to body colors and to diffuse and specular interface reflections, shading, shadows and inter-reflections than the RGB coordinates. We exploit these relationships to segment the color image, and to separate specular and diffuse interface reflections and some inter-reflections from body reflections. The proposed algorithm is effications for uniformly colored dielectric surfaces under singly colored scene illumination. Experimental results conform to our model and algorithm within the liminations discussed.

Journal ArticleDOI
TL;DR: This paper addresses the problem of computing cues to the three-dimensional structure of surfaces in the world directly from the local structure of the brightness pattern of either a single monocular image or a binocular image pair using a multi-scale descriptor of image structure called the windowed second moment matrix.
Abstract: This paper addresses the problem of computing cues to the three-dimensional structure of surfaces in the world directly from the local structure of the brightness pattern of either a single monocular image or a binocular image pair. It is shown that starting from Gaussian derivatives of order up to two at a range of scales in scale-space, local estimates of (i) surface orientation from monocular texture foreshortening, (ii) surface orientation from monocular texture gradients, and (iii) surface orientation from the binocular disparity gradient can be computed without iteration or search, and by using essentially the same basic mechanism. The methodology is based on a multi-scale descriptor of image structure called the windowed second moment matrix, which is computed with adaptive selection of both scale levels and spatial positions. Notably, this descriptor comprises two scale parameters; a local scale parameter describing the amount of smoothing used in derivative computations, and an integration scale parameter determining over how large a region in space the statistics of regional descriptors is accumulated. Experimental results for both synthetic and natural images are presented, and the relation with models of biological vision is briefly discussed.

Journal ArticleDOI
TL;DR: A new approach is presented that allows the shape and motion to be computed from image sequences without having to know the calibration parameters of the camera, and methods of self-calibration of the affine camera are proposed.
Abstract: A key limitation of all existing algorithms for shape and motion from image sequences under orthographic, weak perspective and para-perspective projection is that they require the calibration parameters of the camera. We present in this paper a new approach that allows the shape and motion to be computed from image sequences without having to know the calibration parameters. This approach is derived with the affine camera model, introduced by Mundy and Zisserman (1992), which is a more general class of projections including orthographic, weak perspective and para-perspective projection models. The concept of self-calibration, introduced by Maybank and Faugeras (1992) for the perspective camera and by Hartley (1994) for the rotating camera, is then applied for the affine camera. This paper introduces the 3 intrinsic parameters that the affine camera can have at most. The intrinsic parameters of the affine camera are closely related to the usual intrinsic parameters of the pin-hole perspective camera, but are different in the general case. Based on the invariance of the intrinsic parameters, methods of self-calibration of the affine camera are proposed. It is shown that with at least four views, an affine camera may be self-calibrated up to a scaling factor, leading to Euclidean (similarity) shape reconstruction up to a global scaling factor. Another consequence of the introduction of intrinsic and extrinsic parameters of the affine camera is that all existing algorithms using calibrated affine cameras can be assembled into the same framework and some of them can be easily extented to a batch solution. Experimental results are presented and compared with other methods using calibrated affine cameras.

Journal ArticleDOI
TL;DR: A representation of local image structure is proposed which takes into account both the image's spatial structure at a given location, as well as its “deep structure”, that is, its local behaviour as a function of scale or resolution, which is of interest for several low-level image tasks.
Abstract: A representation of local image structure is proposed which takes into account both the image's spatial structure at a given location, as well as its "deep structure", that is, its local behaviour as a function of scale or resolution (scale-space). This is of interest for several low-level image tasks. The proposed basis of scale-space, for example, enables a precise local study of interactions of neighbouring image intensities in the course of the blurring process. It also provides an extrapolation scheme for local image data, obtained at a given spatial location and resolution, to a finite scale-space neighbourhood. This is especially useful for the determination of sampling rates and for interpolation algorithms in a multilocal context. Another, particularly straightforward application is image enhancement or deblurring, which is an instance of data extrapolation in the high-resolution direction. A potentially interesting feature of the proposed local image parametrisation is that it captures a trade-off between spatial and scale extrapolations from a given interior point that do not exceed a given tolerance. This trade-off suggests the possibility of a fairly coarse scale sampling at the expense of a dense spatial sampling (large relative spatial overlap of scale-space kernels). The central concept developed in this paper is an equivalence class called the multiscale Zocal jet, which is a hierarchical, local characterisation of the image in a full scale-space neighbourhood. For this local jet, a basis of fundamental polynomials is constructed that captures the scale-space paradigm at the local level up to any given order.

Journal ArticleDOI
TL;DR: The method makes use of a real-time implementation of a corner detector and tracker and reconstructs the image position of the desired fixation point from a cluster of corners detected on the object using the affine structure available from two or three views.
Abstract: We describe a novel method of obtaining a fixation point on a moving object for a real-time gaze control system. The method makes use of a real-time implementation of a corner detector and tracker and reconstructs the image position of the desired fixation point from a cluster of corners detected on the object using the affine structure available from two or three views. The method is fast, reliable, viewpoint invariant, and insensitive to occlusion and/or individual corner dropout or reappearance. We compare two- and three-dimensional forms of the algorithm, present results for the method in use with a high performance head/eye platform, and compare the results with two naive fixation methods.

Journal ArticleDOI
TL;DR: The most general camera model of perspective projection is adopted and it is shown that a point can be predicted in the third image as a bilinear function of its images in the first two cameras, and that the tangents to three corresponding curves are related by a trilinearfunction.
Abstract: This paper discusses the problem of predicting image features in an image from image features in two other images and the epipolar geometry between the three images. We adopt the most general camera model of perspective projection and show that a point can be predicted in the third image as a bilinear function of its images in the first two cameras, that the tangents to three corresponding curves are related by a trilinear function, and that the curvature of a curve in the third image is a linear function of the curvatures at the corresponding points in the other two images. Our analysis relies heavily on the use of the fundamental matrix which has been recently introduced (Faugeras et al, 1992) and on the properties of a special plane which we call the trifocal plane. Though the trinocular geometry of points and lines has been very recently addressed, our use of the differential properties of curves for prediction is unique. We thus completely solve the following problem: given two views of an object, predict what a third view would look like. The problem and its solution bear upon several areas of computer vision, stereo, motion analysis, and model-based object recognition. Our answer is quite general since it assumes the general perspective projection model for image formation and requires only the knowledge of the epipolar geometry for the triple of views. We show that in the special case of orthographic projection our results for points reduce to those of Ullman and Basri (Ullman and Basri, 1991). We demonstrate on synthetic as well as on real data the applicability of our theory.

Journal ArticleDOI
TL;DR: In this paper, a computational theory for locating human faces in scenes with certain constraints is presented, where people's faces are the primary subject of the scene, occlusion is minimal, and the faces contrast well against the background.
Abstract: The human face is an object that is easily located in complex scenes by infants and adults alike. Yet the development of an automated system to perform this task is extremely challenging. An attempt to solve this problem raises two important issues in object location. First, natural objects such as human faces tend to have boundaries which are not exactly described by analytical functions. Second, the object of interest (face) could occur in a scene in various sizes, thus requiring the use of scale independent techniques which can detect instances of the object at all scales. Although, the task of identifying a well-framed face (as one of a set of labeled faces) has been well researched, the task of locating a face in a natural scene is relatively unexplored. We present a computational theory for locating human faces in scenes with certain constraints. The theory will be validated by experiments confined to instances where people's faces are the primary subject of the scene, occlusion is minimal, and the faces contrast well against the background.

Journal ArticleDOI
TL;DR: An algorithm to extract the extremal mesh from 3D images is presented, and experiments with synthetic and real 3D medical images show that this graph can be extremely precise and stable.
Abstract: This paper is about a new concept for the description of 3D smooth surfaces: the extremal mesh. In previous works, we have shown how to extract the extremal lines from 3D images, which are the lines where one of the two principal surface curvatures is locally extremal. We have also shown how to extract the extremal points, which are specific points where the two principal curvatures are both extremal. The extremal mesh is the graph of the surface whose vertices are the extremal points and whose edges are the extremal lines: it is invariant with respect to rigid transforms. The good topological properties of this graph are ensured with a new local geometric invariant of 3D surfaces, that we call the Gaussian extremality, and which allows to overcome orientation problems encountered with previous definitions of the extremal lines and points. This paper presents also an algorithm to extract the extremal mesh from 3D images, and experiments with synthetic and real 3D medical images show that this graph can be extremely precise and stable.

Journal ArticleDOI
TL;DR: This paper addresses the problem of computing structure and motion, given a set point and/or line correspondences, in a monocular image sequence, when the camera is not calibrated, and parameterizes the retinal correspondences in the general projective case.
Abstract: In the present paper we address the problem of computing structure and motion, given a set point and/or line correspondences, in a monocular image sequence, when the camera is not calibrated. Considering point correspondences first, we analyse how to parameterize the retinal correspondences, in function of the chosen geometry: Euclidean, affine or projective geometry. The simplest of these parameterizations is called the FQs-representation and is a composite projective representation. The main result is that considering N+1 views in such a monocular image sequence, the retinal correspondences are parameterized by 11 N−4 parameters in the general projective case. Moreover, 3 other parameters are required to work in the affine case and 5 additional parameters in the Euclidean case. These 8 parameters are “calibration” parameters and must be calculated considering at least 8 external informations or constraints. The method being constructive, all these representations are made explicit. Then, considering line correspondences, we show how the the same parameterizations can be used when we analyse the motion of lines, in the uncalibrated case. The case of three views is extensively studied and a geometrical interpretation is proposed, introducing the notion of trifocal geometry which generalizes the well known epipolar geometry. It is also discussed how to introduce line correspondences, in a framework based on point correspondences, using the same equations. Finally, considering the F Qs-representation, one implementation is proposed as a “motion module”, taking retinal correspondences as input, and providing and estimation of the 11 N−4 retinal motion parameters. As discussed in this paper, this module can also estimate the 3D depth of the points up to an affine and projective transformation, defined by the 8 parameters identified in the first section. Experimental results are provided.

Journal ArticleDOI
TL;DR: The model may be used for example to predict the performance of given stereo ground plane prediction system or a monocular drivable region detection system on and AGV and in reverse to determine the camera resolution required if a vehicle in motion is to resolve obstacles of a given height a given distance from it.
Abstract: This paper presents a means of segmenting planar regions from two views of a scene using point correspondences. The initial selection of groups of coplanar points is performed on the basis of conservation of two five point projective invariants (groups for which this invariant is conserved are assumed to be coplanar). The five point correspondences are used to estimate a projectivity which is used to predict the change in position of other points assuming they lie on the same plane as the original four. The variance in any points new position is then used to define a distance threshold between actual and predicted position which is used as a coplanarity test to find extended planar regions. If two distinct planar regions can be found then a novel motion direction estimator suggests itself. The projection of the line of intersection of two planes in an image may also be recovered. An analytical error model is derived which relates image uncertainty in a corner's position to genuine perpendicular height of a point above a given plane in the world. The model may be used for example to predict the performance of given stereo ground plane prediction system or a monocular drivable region detection system on and AGV. The model may also be used in reverse to determine the camera resolution required if a vehicle in motion is to resolve obstacles of a given height a given distance from it.

Journal ArticleDOI
TL;DR: This work has proposed iterative algorithms for recovering high resolution albedo with the knowledge of high resolution height and vice versa for surface reconstruction.
Abstract: Given a set of low resolution camera images of a Lambertian surface, it is possible to reconstruct high resolution luminance and height information, when the relative displacements of the image frames are known. We have proposed iterative algorithms for recovering high resolution albedo with the knowledge of high resolution height and vice versa. The problem of surface reconstruction has been tackled in a Bayesian framework and has been formulated as one of minimizing an error function. Markov Random Fields (MRF) have been employed to characterize the a priori constraints on the solution space. As for the surface height, we have attempted a direct computation without refering to surface orientations, while increasing the resolution by camera jittering.

Journal Article
TL;DR: ECL was obtained for the chelates and mixed-ligand systems by reducing the complexes at a Pt electrode in the presence of peroxydisulfate in acetonitrile solutions and was attributed to the electron-transfer reaction between the reduced bound ligands and SO(4)(•)(-), followed by intramolecular excitation transfer from the excited ligand orbitals to the metal-centered 4f states.
Abstract: The electrochemistry and electrogenerated chemiluminescence (ECL) of a series of europium chelates, cryptates, and mixed-ligand chelate/cryptand complexes were studied. The complexes were of the following general forms: EuL4-, where L = β-diketonate, a bis-chelating ligand (such as dibenzoylmethide), added as salts (A)EuL4, where A = tetrabutylammonium ion or piperidinium ion (pipH+); Eu(crypt)3+, where crypt = a cryptand ligand, e.g., 4,7,13,16,21-pentaoxa-1,10-diazabicyclo[8,8,5]tricosane; and Eu(crypt)(L)2+ for the mixed-ligand systems. ECL was obtained for the chelates and mixed-ligand systems by reducing the complexes at a Pt electrode in the presence of peroxydisulfate in acetonitrile solutions and was attributed to the electron-transfer reaction between the reduced bound ligands and SO4•-, followed by intramolecular excitation transfer from the excited ligand orbitals to the metal-centered 4f states. No ECL was observed under the same conditions for the europium complexes incorporating only the cr...

Journal ArticleDOI
TL;DR: The paper concludes with an algorithm for constructing optimal prototypes for classes of objects, which combines the correspondence and object pose computed in the categorization stage to align the prototype with the image with the result that identification is reduced to a series of simple template comparisons.
Abstract: A scheme for recognizing 3D objects from single 2D images under orthographic projection is introduced. The scheme proceeds in two stages. In the first stage, the categorization stage, the image is compared to prototype objects. For each prototype, the view that most resembles the image is recovered, and, if the view is found to be similar to the image, the class identity of the object is determined. In the second stage, the identification stage, the observed object is compared to the individual models of its class, where classes are expected to contain objects with relatively similar shapes. For each model, a view that matches the image is sought. If such a view is found, the object's specific identity is determined. The advantage of categorizing the object before it is identified is twofold. First, the image is compared to a smaller number of models, since only models that belong to the object's class need to be considered. Second, the cost of comparing the image to each model in a class is very low, because correspondence is computed once for the whole class. More specifically, the correspondence and object pose computed in the categorization stage to align the prototype with the image are reused in the identification stage to align the individual models with the image. As a result, identification is reduced to a series of simple template comparisons. The paper concludes with an algorithm for constructing optimal prototypes for classes of objects.

Journal ArticleDOI
TL;DR: This work has developed a methodology for rapid active detection and classification of junctions by selection of fixation points based on direct computations from image data and allows integration of stereo and accommodation cues with luminance information.
Abstract: It is well-known that active selection of fixation points in humans is highly context and task dependent. It is therefore likely that successful computational processes for fixation in active vision should be so too. We are considering active fixation in the context of recognition of man-made objects characterized by their shapes. In this situation the qualitative shape and type of observed junctions play an important role. The fixations are driven by a grouping strategy, which forms sets of connected junctions separated from the surrounding at depth discontinuities. We have furthermore developed a methodology for rapid active detection and classification of junctions by selection of fixation points. The approach is based on direct computations from image data and allows integration of stereo and accommodation cues with luminance information. This work form a part of an effort to perform active recognition of generic objects, in the spirit of Malik and Biederman, but on real imagery rather than on line-drawings.

Journal ArticleDOI
TL;DR: In this article, a non-frontal imaging camera (NICAM) is proposed for active sensing of range information from focus, where the sensor plane is not perpendicular to the optical axis as is standard.
Abstract: This paper is concerned with active sensing of range information from focus. It describes a new type of camera whose sensor plane is not perpendicular to the optical axis as is standard. This special imaging geometry eliminates sensor plane movement usually necessary for focusing. Camera panning, required for panoramic viewing anyway, in addition enables focusing and range estimation. Thus panning integrates the two standard mechanical actions of focusing and panning, implying range estimation is done at the speed of panning. An implementation of the proposed Non-frontal Imaging Camera (NICAM) design is described. Experiments on range estimation are also presented.

Journal ArticleDOI
TL;DR: This work proposes a promising methodology to address this last problem for a large class of objects: generalized cylinders based on exploiting mathematical invariant properties of the contours of generalized cylinders in a perceptual grouping approach and shows that using these properties greatly helps addressing the figure-ground problem in a more rigorous way than previous (intuitive) perceptual grouping methods.
Abstract: Since the early days of computer vision research, shape from contour has been one of the most challenging problems. Many researchers in the field have attempted to understand this problem and proposed different approaches to solve it. Shape from contour still remains one of the hardest problems in the field. The problem has two major difficulties. First, 2D properties of contours of viewed objects are generally not sufficient by themselves to uniquely determine 3D shape, as one dimension is lost in the projection. Second, real images produce imperfect contours that make their interpretation particularly difficult. The first problem has received some attention in the research community but in the context of perfect contours. The second one, however, has received very little. In this work, we propose a promising methodology to address this last problem for a large class of objects: generalized cylinders. It is based on exploiting mathematical invariant properties of the contours of generalized cylinders in a perceptual grouping approach. We show that using these properties greatly helps addressing the figure-ground problem in a more rigorous way than previous (intuitive) perceptual grouping methods. Our approach exploits the interplay between local and global features by handling different levels of the feature hierarchy. We have developed and implemented a method that handles SHGCs in complex seenes with markings and occlusion. We demonstrate the application of our method of shape description and scene segmentation on complex real images. We also demonstrate the usage of the obtained descriptions for recovery of complete 3-D object centered descriptions of viewed objects from a single intensity image.

Journal ArticleDOI
TL;DR: This work addresses the well-known problem of estimating the motion and structure of a plane, but in the case where the visual system is not calibrated and in a monocular image-sequence, by developing and illustrating a method to estimate robustly any collineation in the image.
Abstract: We address the well-known problem of estimating the motion and structure of a plane, but in the case where the visual system is not calibrated and in a monocular image-sequence.

Journal ArticleDOI
TL;DR: This paper study in detail the structure of the ridge lines, crest lines and sub-parabolic lines on a generic surface, and on a surface which is evolving in a generic (one-parameter) family.
Abstract: The ridge lines on a surface can be defined either via contact of the surface with spheres, or via extrema of principal curvatures along lines of curvature. Certain subsets of ridge lines called crest lines have been singled out by some authors for medical imaging applications. There is a related concept of sub-parabolic line on a surface, also defined via extrema of principal curvatures. In this paper we study in detail the structure of the ridge lines, crest lines and sub-parabolic lines on a generic surface, and on a surface which is evolving in a generic (one-parameter) family. The mathematical details of this study are in Bruce et al. (1994c).