scispace - formally typeset
Search or ask a question

Showing papers in "International Journal of Computer Vision in 1991"


Journal ArticleDOI
TL;DR: In this paper, color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models, and they can differentiate among a large number of objects.
Abstract: Computer vision is moving into a new era in which the aim is to develop visual skills for robots that allow them to interact with a dynamic, unconstrained environment. To achieve this aim, new kinds of vision algorithms need to be developed which run in real time and subserve the robot's goals. Two fundamental goals are determining the identity of an object with a known location, and determining the location of a known object. Color can be successfully used for both tasks. This dissertation demonstrates that color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models. It shows that color histograms are stable object representations in the presence of occlusion and over change in view, and that they can differentiate among a large number of objects. For solving the identification problem, it introduces a technique called Histogram Intersection, which matches model and image histograms and a fast incremental version of Histogram Intersection which allows real-time indexing into a large database of stored models. It demonstrates techniques for dealing with crowded scenes and with models with similar color signatures. For solving the location problem it introduces an algorithm called Histogram Backprojection which performs this task efficiently in crowded scenes.

5,672 citations


Journal ArticleDOI
TL;DR: The least-median-of-squares (LMedS) method, which yields the correct result even when half of the data is severely corrupted, is described and compared with the class of robust M-estimators.
Abstract: Regression analysis (fitting a model to noisy data) is a basic technique in computer vision, Robust regression methods that remain reliable in the presence of various types of noise are therefore of considerable importance. We review several robust estimation techniques and describe in detail the least-median-of-squares (LMedS) method. The method yields the correct result even when half of the data is severely corrupted. Its efficiency in the presence of Gaussian noise can be improved by complementing it with a weighted least-squares-based procedure. The high time-complexity of the LMedS algorithm can be reduced by a Monte Carlo type speed-up technique. We discuss the relationship of LMedS with the RANSAC paradigm and its limitations in the presence of noise corrupting all the data, and we compare its performance with the class of robust M-estimators. References to published applications of robust techniques in computer vision are also given.

653 citations


Journal ArticleDOI
TL;DR: Direct methods for recovering the motion of an observer in a static environment in the case of pure rotation, pure translation, and arbitrary motion when the rotation is known are developed.
Abstract: We have developed direct methods for recovering the motion of an observer in a static environment in the case of pure rotation, pure translation, and arbitrary motion when the rotation is known. Some of these methods are based on the minimization of the difference between the observed time derivative of brightness and that predicted from the spatial brightness gradient, given the estimated motion. We minimize the square of the integral of this difference taken over the image region of interest. Other methods presented here exploit the fact that surfaces have to be in front of the observer in order to be seen. We do not establish point correspondences, nor do we estimate the optical flow. We use only first-order derivatives of the image brightness, and we do not assume an analytic form for the surface. We show that the field of view should be large to accurately recover the components of motion in the direction toward the image region. We also demonstrate the importance of points where the time derivative of brightness is small and discuss difficulties resulting from very large depth ranges. We emphasize the need for adequate filtering of the image data before sampling to avoid aliasing, in both the spatial and temporal dimensions.

379 citations


Journal ArticleDOI
TL;DR: In this article, a Markov random field (MRF) formalism is proposed to unify several approaches to image segmentation in early vision under a common framework, in which the probability distributions are specified by an energy function.
Abstract: We attempt to unify several approaches to image segmentation in early vision under a common framework. The Bayesian approach is very attractive since: (i) it enables the assumptions used to be explicitly stated in the probability distributions, and (ii) it can be extended to deal with most other problems in early vision. Here, we consider the Markov random field formalism, a special case of the Bayesian approach, in which the probability distributions are specified by an energy function. We show that: (i) our discrete formulations for the energy function is closely related to the continuous formulation; (ii) by using the mean field (MF) theory approach, introduced by Geiger and Girosi [1991], several previous attempts to solve these energy functions are effectively equivalent; (iii) by varying the parameters of the energy functions we can obtain connections to nonlinear diffusion and minimal description length approaches to image segmentation; and (iv) simple modifications to the energy can give a direct relation to robust statistics or can encourage hysteresis and nonmaximum suppression.

174 citations


Journal ArticleDOI
TL;DR: A program is presented that emulates the human vision system's ability to interpret two-dimensional images as three-dimensional objects for the case of images consisting of line-drawings and provides an explanation of the Necker cube illusion.
Abstract: The human vision system has the ability to interpret two-dimensional images as three-dimensional objects. In this article, we present a program that emulates this ability for the case of images consisting of line-drawings. As a by-product of the approach, we provide an explanation of the Necker cube illusion.

157 citations


Journal ArticleDOI
TL;DR: This article examines how sensor values are modified in the mutual reflection region and shows that a good approximation of the surface spectral reflectance function for each surface can be recovered by using the extra information from mutual reflection.
Abstract: Mutual reflection occurs when light reflected from one surface illuminates a second surface. In this situation, the color of one or both surfaces can be modified by a color-bleeding effect. In this article we examine how sensor values (e.g., RGB values) are modified in the mutual reflection region and show that a good approximation of the surface spectral reflectance function for each surface can be recovered by using the extra information from mutual reflection. Thus color constancy results from an examination of mutual reflection. Use is made of finite dimensional linear models for ambient illumination and for surface spectral reflectance. If m and n are the number of basis functions required to model illumination and surface spectral reflectance respectively, then we find that the number of different sensor classes p must satisfy the condition p≥(2 n+m)/3. If we use three basis functions to model illumination and three basis functions to model surface spectral reflectance, then only three classes of sensors are required to carry out the algorithm. Results are presented showing a small increase in error over the error inherent in the underlying finite dimension models.

142 citations


Journal ArticleDOI
TL;DR: It is argued that most images are effectively impossible, with no corresponding physically reasonable surface, and that any image can be rendered effectively impossible by a small perturbation of its intensities.
Abstract: For general images of smooth objects wholly contained in the field of view, and for illumination symmetric around the viewing direction, it is proven that shape is uniquely determined by shading. Thus, shape from shading is a well-posed problem under these illumination conditions; and regularization is unnecessary for surface reconstruction and should be avoided. Generic properties of surfaces and images are established. Questions of existence are also discussed. Under the conditions above, it is argued that most images are effectively impossible, with no corresponding physically reasonable surface, and that any image can be rendered effectively impossible by a small perturbation of its intensities. This is explicitly illustrated for a synthetic image. The proofs are based on ideas of dynamical systems theory and global analysis.

127 citations


Journal ArticleDOI
TL;DR: This paper presents an algorithm based on multiple frames that employs only the rigidity assumption, is simple and mathematically elegant and, experimentally, proves to be a major improvement over the two-frame algorithms.
Abstract: One of the main issues in the area of motion estimation given the correspondences of some features in a sequence of images is sensitivity to error in the input. The main way to attack the problem, as with several other problems in science and engineering, is redundancy in the data. Up to now all the algorithms developed either used two frames or depended on assumptions about the motion or the shape of the scene. We present in this paper an algorithm based on multiple frames that employs only the rigidity assumption, is simple and mathematically elegant and, experimentally proves to be a major improvement over the two-frame algorithms. The algorithm does minimization of the squared error which we prove equivalent to an eigenvalue minimization problem. One of the side effects of this mean-square method is that the algorithm can have a very descriptive physical interpretation in terms of the “loaded spring model.”

124 citations


Journal ArticleDOI
TL;DR: Two complementary methods for the detection of moving objects by a moving observer based on the fact that, in a rigid environment, the projected velocity at any point in the image is constrained to lie on a 1-D locus in velocity space whose parameters depend only on the observer motion are described.
Abstract: Two complementary methods for the detection of moving objects by a moving observer are described. The first is based on the fact that, in a rigid environment, the projected velocity at any point in the image is constrained to lie on a 1-D locus in velocity space whose parameters depend only on the observer motion. If the observer motion is known, an independently moving object can, in principle, be detected because its projected velocity is unlikely to fall on this locus. We show how this principle can be adapted to use partial information about the motion field and observer motion that can be rapidly computed from real image sequences. The second method utilizes the fact that the apparent motion of a fixed point due to smooth observer motion changes slowly, while the apparent motion of many moving objects such as animals or maneuvering vehicles may change rapidly. The motion field at a given time can thus be used to place constraints on the future motion field which, if violated, indicate the presence of an autonomously maneuvering object. In both cases, the qualitative nature of the constraints allows the methods to be used with the inexact motion information typically available from real-image sequences. Implementations of the methods that run in real time on a parallel pipelined image processing system are described.

122 citations


Journal ArticleDOI
TL;DR: It is shown that these solutions to the minimization problems of Horn & Schunk (1981) and Nagel (1987) exist, are unique, and depend continuously on the input data, which make it possible to approximate efficiently the (weak) solutions of the associated boundary-value problems in irregularly shaped domains using finite elements.
Abstract: Snyder (1989) has recently classified all smoothness terms which involve first-order derivatives of the flowfield u(x, t) and of the image grey-value function g(x, t). The physically plausible smoothness terms belonging to this class are known from the work of Horn and Schunck (1981) and Nagel (1987). In this paper we discuss the possibilities of approximating the solutions to the minimization problems of Horn & Schunk (1981) and Nagel (1987). In particular, it is shown that these solutions exist, are unique, and depend continuously on the input data. These properties make it possible, while taking into consideration arbitrary models of the grey-value function, to approximate efficiently the (weak) solutions of the associated boundary-value problems in irregularly shaped domains (with a “sufficiently smooth” boundary) using finite elements. Experiments with image sequences from synthetic as well as outdoor scenes show how the orientation dependency of the smoothness term in Nagel's approach influences the results.

111 citations


Journal ArticleDOI
TL;DR: This work proposes an adaptive multiscale method, where the discretization scale is chosen locally according to an estimate of the relative error in the velocity estimation, based on image properties, and provides substantially better estimates of optical flow than do conventional algorithms, while adding little computational cost.
Abstract: Single-scale approaches to the determination of the optical flow field from the time-varying brightness pattern assume that spatio-temporal discretization is adequate for representing the patterns and motions in a scene. However, the choice of an appropriate spatial resolution is subject to conflicting, scene-dependent, constraints. In intensity-base methods for recovering optical flow, derivative estimation is more accurate for long wavelengths and slow velocities (with respect to the spatial and temporal discretization steps). On the contrary, short wavelengths and fast motions are required in order to reduce the errors caused by noise in the image acquisition and quantization process. Estimating motion across different spatial scales should ameliorate this problem. However, homogeneous multiscale approaches, such as the standard multigrid algorithm, do not improve this situation, because an optimal velocity estimate at a given spatial scale is likely to be corrupted at a finer scale. We propose an adaptive multiscale method, where the discretization scale is chosen locally according to an estimate of the relative error in the velocity estimation, based on image properties. Results for synthetic and video-acquired images show that our coarse-to-fine method, fully parallel at each scale, provides substantially better estimates of optical flow than do conventional algorithms, while adding little computational cost.

Journal ArticleDOI
TL;DR: A general approach to vergence control is outlined, consisting of a control loop driven by an algorithm that estimates the vergence error and is used to verge the eyes of the Rochester Robot in real time.
Abstract: In binocular systems,vergence is the process of adjusting the angle between the eyes (or cameras) so that both eyes are directed at the same world point. Its utility is most obvious for foveate systems such as the human visual system, but it is a useful strategy for nonfoveate binocular robots as well. Here, we discuss the vergence problem and outline a general approach to vergence control, consisting of a control loop driven by an algorithm that estimates the vergence error. As a case study, this approach is used to verge the eyes of the Rochester Robot in real time. Vergence error is estimated with the cepstral disparity filter. The cepstral filter is analyzed, and it is shown in this application to be equivalent to correlation with an adaptive prefilter; carrying this idea to its logical conclusion converts the cepstral filter into phase correlation. The demonstration system uses a PD controller in cascade with the error estimator. An efficient real-time implementation of the error estimator is discussed, and empirical measures of the performance of both the disparity estimator and the overall system are presented.

Journal ArticleDOI
TL;DR: This work proposes these augmented HMMs as a theory of adaptive skill acquisition and generation, and gives an example, the what-where-AHMM, which creates a hybrid skill from separate skills based on object location and object identity.
Abstract: Advances in technology and in active vision research allow and encourage sequential visual information acquisition. Hidden Markov models (HMMs) can represent probabilistic sequences and probabilistic graph structures: here we explore their use in controlling the acquisition of visual information. We include a brief tutorial with two examples: (1) use input sequences to derive an aspect graph and (2) similarly derive a finite state machine for control of visual processing.

Journal ArticleDOI
TL;DR: A new approach to the visual recognition of cursive handwriting is described by using a method based on pictorial alignment and on a model of the process of handwriting that permits recognition of character instances that appear embedded in connected strings.
Abstract: We describe a new approach to the visual recognition of cursive handwriting. An effort is made to attain human-like performance by using a method based on pictorial alignment and on a model of the process of handwriting. The alignment approach permits recognition of character instances that appear embedded in connected strings. A system embodying this approach has been implemented and tested on five different word sets. The performance was stable both across words and across writers. The system exhibited a substantial ability to interpret cursive connected strings without recourse to lexical knowledge.

Journal ArticleDOI
TL;DR: A new approach to effect the transition between local and global representations is presented, based on the notion of a covering, or a collection of objects whose union is equivalent to the full one, which is defined by the union of the local dynamic curves.
Abstract: We present a new approach to effect the transition between local and global representations. It is based on the notion of a covering, or a collection of objects whose union is equivalent to the full one. The mathematics of computing global coverings are developed in the context of curve detection, where an intermediate representation (the tangent field) provides a reliable local description of curve structure. This local information is put together globally in the form of a potential distribution. The elements of the covering are then short curves, each of which evolves in parallel to seek the valleys of the potential distribution. The initial curve positions are also derived from the tangent field, and their evolution is governed by variational principles. When stationary configurations are achieved, the global dynamic covering is defined by the union of the local dynamic curves.

Journal ArticleDOI
TL;DR: An algorithm that uses time-varying image velocity information to compute the observer's translation and rotation and the normalized surface gradient of the 3-D planar surface and an extensive error analysis of the motion and structure problem is presented.
Abstract: This research addresses the problem of noise sensitivity inherent in motion and structure algorithms The motion and structure paradigm is a two-step process First, we measure image velocities and, perhaps, their spatial and temporal derivatives, are obtained from time-varying image intensity data and second, we use these data to compute the motion of a moving monocular observer in a stationary environment under perspective projection, relative to a single 3-D planar surface The first contribution of this article is an algorithm that uses time-varying image velocity information to compute the observer's translation and rotation and the normalized surface gradient of the 3-D planar surface The use of time-varying image velocity information is an important tool in obtaining a more robust motion and structure calculation The second contribution of this article is an extensive error analysis of the motion and structure problem Any motion and structure algorithm that uses image velocity information as its input should exhibit error sensitivity behavior compatible with the results reported here We perform an average and worst case error analysis for four types of image velocity information: full and normal image velocities and full and normal sets of image velocity and its derivatives (These derivatives are simply the coefficients of a truncated Taylor series expansion about some point in space and time) The main issues we address here are: just how sensitive is a motion and structure computation in the presence of noisy input, or alternately, how accurate must our image velocity information be, how much and what type of input data is needed, and under what circumstances is motion and structure feasible? That is, when can we be sure that a motion and structure computation will produce usable results? We base our answers on a numerical error analysis we conduct for a large number of motions

Journal ArticleDOI
TL;DR: A new stereo algorithm that produces dense, high-quality, subpixel disparity maps that can reconstruct the correct correspondence between two images even when there is substantial vertical displacement between them and that uses a pre-match filter that prevents two patches of image from matching if they do not have the same topological structure.
Abstract: Presented here is a new stereo algorithm that produces dense, high-quality, subpixel disparity maps. It offers two improvements over previous algorithms. First, it does not blur disparity values across sharp changes in depth. Second, it can reconstruct the correct correspondence between two images even when there is substantial vertical displacement between them: this algorithm has been tested with rotations up to 10 degrees and vertical translations up to 16 pixels. Although such image pairs require extra processing time, this ability is vital when exact calibration cannot be maintained. The new algorithm depends on two new ideas. First, it exploits the fact that the correct vertical disparity field is due to camera misalignment and, thus, has only a few (significant) degrees of freedom. The algorithm passes camera alignment parameters, not raw disparity fields, between scales. Disparities at individual locations can diverge only slightly from this global model, greatly reducing the algorithm's search space. Second, the new algorithm uses a pre-match filter that prevents two patches of image from matching if they do not have the same (local) topological structure. This constraint subsumes previous “figural continuity” proposals and can be checked by simple, local operations. The filter seems to improve the algorithm's ability to select the correct match from many alternatives and it suppresses intermediate values near sharp changes in disparity. This technique can be extended to other matching tasks, such as motion tracking, analyzing texture periodicity, and evaluating the performance of edge finders.

Journal ArticleDOI
TL;DR: All the usual tools of statistical signal analysis, for example, statistical decision theory for object recognition, can be brought to bear and the information extraction appears to be robust and computationally reasonable; the concepts are geometric and simple; and essentially optimal accuracy should result.
Abstract: A new approach is introduced to estimating object surfaces in three-dimensional space from a sequence of images. A 3D surface of interest here is modeled as a function known up to the values of a few parameters. Surface estimation is then treated as the general problem of maximum-likelihood parameter estimation based on two or more functionally related data sets. In our case, these data sets constitute a sequence of images taken at different locations and orientations. Experiments are run to illustrate the various advantages of using as many images as possible in the estimation and of distributing camera positions from first to last over as large a baseline as possible. In order to extract all the usable information from the sequence of images, all the images should be available simultaneously for the parameter estimation. We introduce the use of asymptotic Bayesian approximations in order to summarize the useful information in a sequence of images, thereby drastically reducing both the storage and the amount of processing required. This leads to a sequential Bayesian estimator for the surface parameters, where the information extracted from previous images is summarized in a quadratic form. The attractiveness of our approach is that now all the usual tools of statistical signal analysis, for example, statistical decision theory for object recognition, can be brought to bear; the information extraction appears to be robust and computationally reasonable; the concepts are geometric and simple; and essentially optimal accuracy should result. Experimental results are shown for extending this approach in two ways. One is to model a highly variable surface as a collection of small patches jointly constituting a stochastic process (e.g., a Markov random field) and to reconstruct this surface using maximum a posteriori probability (MAP) estimation. The other is to cluster together those patches constituting the same primitive object through the use of MAP segmentation. This provides a simultaneous estimation and segmentation of a surface into primitive constituent surfaces.

Journal ArticleDOI
TL;DR: It was observed that trinocular local matching reduced the percentage of mismatches having large disparity errors by more than half when compared to binocular matching, and increased the computational cost of local matching over binocular by only one-fourth.
Abstract: This paper looks at the twin issues of the gain in accuracy of stereo correspondence and the accompanying increase in computational cost due to the use of a third camera for stereo analysis. Trinocular stereo algorithms differ from binocular algorithms essentially in the epipolar constraint used in the local matching stage. The current literature does not provide any insight into the relative merits of binocular and trinocular stereo matching with the matching accuracy being verified aginst the ground truth. Experiments for evaluating the relative performance of binocular and trinocular stereo algorithms were conducted. The stereo images used for the performance evaluation were generated by applying a Lambertian reflectance model to real Digital Elevation Maps (DEMs) available from the U.S. Geological Survey. The matching accuracy of the stereo algorithms was evaluated by comparing the observed stereo disparity against the ground truth derived from the DEMs. It was observed that trinocular local matching reduced the percentage of mismatches having large disparity errors by more than half when compared to binocular matching. On the other hand, trinocular stereopsis increased the computational cost of local matching over binocular by only about one-fourth. We also present a quantization-error analysis of the depth reconstruction process for the nonparallel stereo-imaging geometry used in our experiments.


Journal Article
TL;DR: Linear time algorithms are given for computing the chessboard distance transform for both pointer-based and linear quadtree representations, both of which consist of a pair of tree traversals.
Abstract: Linear time algorithms are given for computing the chessboard distance transform for both pointer-based and linear quadtree representations. Comparisons between algorithmic styles for the two representations are made. Both versions of the algorithm consist of a pair of tree traversals.