scispace - formally typeset
Search or ask a question

Showing papers in "International Journal of Computer Vision in 1999"


Journal ArticleDOI
TL;DR: A novel scene reconstruction technique is presented, different from previous approaches in its ability to cope with large changes in visibility and its modeling of intrinsic scene color and texture information.
Abstract: A novel scene reconstruction technique is presented, different from previous approaches in its ability to cope with large changes in visibility and its modeling of intrinsic scene color and texture information. The method avoids image correspondence problems by working in a discretized scene space whose voxels are traversed in a fixed visibility ordering. This strategy takes full account of occlusions and allows the input cameras to be far apart and widely distributed about the environment. The algorithm identifies a special set of invariant voxels which together form a spatial and photometric reconstruction of the scene, fully consistent with the input images. The approach is evaluated with images from both inward-facing and outward-facing cameras.

971 citations


Journal ArticleDOI
TL;DR: This work presents a multiscale method in which a nonlinear diffusion filter is steered by the so-called interest operator (second-moment matrix, structure tensor), and an m-dimensional formulation of this method is analysed with respect to its well-posedness and scale-space properties.
Abstract: The completion of interrupted lines or the enhancement of flow-like structures is a challenging task in computer vision, human vision, and image processing. We address this problem by presenting a multiscale method in which a nonlinear diffusion filter is steered by the so-called interest operator (second-moment matrix, structure tensor). An m-dimensional formulation of this method is analysed with respect to its well-posedness and scale-space properties. An efficient scheme is presented which uses a stabilization by a semi-implicit additive operator splitting (AOS), and the scale-space behaviour of this method is illustrated by applying it to both 2-D and 3-D images.

868 citations


Journal ArticleDOI
TL;DR: A theoretical proof is given which shows that the absence of skew in the image plane is sufficient to allow for self-calibration and a method to detect critical motion sequences is proposed.
Abstract: In this paper the theoretical and practical feasibility of self-calibration in the presence of varying intrinsic camera parameters is under investigation. The paper‘s main contribution is to propose a self-calibration method which efficiently deals with all kinds of constraints on the intrinsic camera parameters. Within this framework a practical method is proposed which can retrieve metric reconstruction from image sequences obtained with uncalibrated zooming/focusing cameras. The feasibility of the approach is illustrated on real and synthetic examples. Besides this a theoretical proof is given which shows that the absence of skew in the image plane is sufficient to allow for self-calibration. A counting argument is developed which—depending on the set of constraints—gives the minimum sequence length for self-calibration and a method to detect critical motion sequences is proposed.

829 citations


Journal ArticleDOI
TL;DR: This paper derives the complete class of single-lens single-mirror catadioptric sensors that have a single viewpoint, and describes all of the solutions in detail, including the degenerate ones, with reference to many of the catadi optric systems that have been proposed in the literature.
Abstract: Conventional video cameras have limited fields of view which make them restrictive for certain applications in computational vision. A catadioptric sensor uses a combination of lenses and mirrors placed in a carefully arranged configuration to capture a much wider field of view. One important design goal for catadioptric sensors is choosing the shapes of the mirrors in a way that ensures that the complete catadioptric system has a single effective viewpoint. The reason a single viewpoint is so desirable is that it is a requirement for the generation of pure perspective images from the sensed images. In this paper, we derive the complete class of single-lens single-mirror catadioptric sensors that have a single viewpoint. We describe all of the solutions in detail, including the degenerate ones, with reference to many of the catadioptric systems that have been proposed in the literature. In addition, we derive a simple expression for the spatial resolution of a catadioptric sensor in terms of the resolution of the cameras used to construct it. Moreover, we include detailed analysis of the defocus blur caused by the use of a curved mirror in a catadioptric sensor.

684 citations


Journal ArticleDOI
TL;DR: An algorithm to detect depth discontinuities from a stereo pair of images is presented, which handles large untextured regions, uses a measure of pixel dissimilarity that is insensitive to image sampling, and prunes bad search nodes to increase the speed of dynamic programming.
Abstract: An algorithm to detect depth discontinuities from a stereo pair of images is presented. The algorithm matches individual pixels in corresponding scanline pairs, while allowing occluded pixels to remain unmatched, then propagates the information between scanlines by means of a fast postprocessor. The algorithm handles large untextured regions, uses a measure of pixel dissimilarity that is insensitive to image sampling, and prunes bad search nodes to increase the speed of dynamic programming. The computation is relatively fast, taking about 600 nanoseconds per pixel per disparity on a personal computer. Approximate disparity maps and precise depth discontinuities (along both horizontal and vertical boundaries) are shown for several stereo image pairs containing textured, untextured, fronto-parallel, and slanted objects in indoor and outdoor scenes.

618 citations


Journal ArticleDOI
TL;DR: It is shown that the object's visible surface f(x, y) is indistinguishable from a “generalized bas-relief” transformation of the object’s geometry, and a corresponding transformation on theobject's albedo, which results in an identical image of the transformed object illuminated by similarly transformed light sources.
Abstract: When an unknown object with Lambertian reflectance is viewed orthographically, there is an implicit ambiguity in determining its 3-d structure: we show that the object‘s visible surface f(x, y) is indistinguishable from a “generalized bas-relief” transformation of the object‘s geometry, \bar f (x, y) e λf(x, y) + μx + νy, and a corresponding transformation on the object‘s albedo. For each image of the object illuminated by an arbitrary number of distant light sources, there exists an identical image of the transformed object illuminated by similarly transformed light sources. This result holds both for the illuminated regions of the object as well as those in cast and attached shadows. Furthermore, neither small motion of the object, nor of the viewer will resolve the ambiguity in determining the flattening (or scaling) λ of the object‘s surface. Implications of this ambiguity on structure recovery and shape representation are discussed.

598 citations


Journal ArticleDOI
TL;DR: This work significantly improves upon existing DP stereo matching methods by showing that while some cost must be assigned to unmatched pixels, sensitivity to occlusion-cost and algorithmic complexity can be significantly reduced when highly-reliable matches, or ground control points, are incorporated into the matching process.
Abstract: A method for solving the stereo matching problem in the presence of large occlusion is presented. A data structure—the disparity space image—is defined to facilitate the description of the effects of occlusion on the stereo matching process and in particular on dynamic programming (DP) solutions that find matches and occlusions simultaneously. We significantly improve upon existing DP stereo matching methods by showing that while some cost must be assigned to unmatched pixels, sensitivity to occlusion-cost and algorithmic complexity can be significantly reduced when highly-reliable matches, or ground control points, are incorporated into the matching process. The use of ground control points eliminates both the need for biasing the process towards a smooth solution and the task of selecting critical prior probabilities describing image formation. Finally, we describe how the detection of intensity edges can be used to bias the recovered solution such that occlusion boundaries will tend to be proposed along such edges, reflecting the observation that occlusion boundaries usually cause intensity discontinuities.

524 citations


Journal ArticleDOI
TL;DR: A new method for image rectification, the process of resampling pairs of stereo images taken from widely differing viewpoints in order to produce a pair of “matched epipolar projections”, based on an examination of the fundamental matrix of Longuet-Higgins which describes the epipolar geometry of the image pair.
Abstract: This paper gives a new method for image rectification, the process of resampling pairs of stereo images taken from widely differing viewpoints in order to produce a pair of “matched epipolar projections”. These are projections in which the epipolar lines run parallel with the x-axis and consequently, disparities between the images are in the x-direction only. The method is based on an examination of the fundamental matrix of Longuet-Higgins which describes the epipolar geometry of the image pair. The approach taken is consistent with that advocated by Faugeras (1992) of avoiding camera calibration. The paper uses methods of projective geometry to determine a pair of 2D projective transformations to be applied to the two images in order to match the epipolar lines. The advantages include the simplicity of the 2D projective transformation which allows very fast resampling as well as subsequent simplification in the identification of matched points and scene reconstruction.

459 citations


Journal ArticleDOI
TL;DR: A general tridimensional reconstruction algorithm of range and volumetric images, based on deformable simplex meshes, which can handle surfaces without any restriction on their shape or topology.
Abstract: In this paper, we propose a general tridimensional reconstruction algorithm of range and volumetric images, based on deformable simplex meshes. Simplex meshes are topologically dual of triangulations and have the advantage of permitting smooth deformations in a simple and efficient manner. Our reconstruction algorithm can handle surfaces without any restriction on their shape or topology. The different tasks performed during the reconstruction include the segmentation of given objects in the scene, the extrapolation of missing data, and the control of smoothness, density, and geometric quality of the reconstructed meshes. The reconstruction takes place in two stages. First, the initialization stage creates a simplex mesh in the vicinity of the data model either manually or using an automatic procedure. Then, after a few iterations, the mesh topology can be modified by creating holes or by increasing its genus. Finally, an iterative refinement algorithm decreases the distance of the mesh from the data while preserving high geometric and topological quality. Several reconstruction examples are provided with quantitative and qualitative results.

366 citations


Journal ArticleDOI
Jing Huang1, S. Ravi Kumar1, Mandar Mitra1, Wei-Jing Zhu1, Ramin Zabih1 
TL;DR: Experimental evidence shows that the color correlogram outperforms not only the traditional color histogram method but also the recently proposed histogram refinement methods for image indexing/retrieval.
Abstract: We define a new image feature called the color correlogram and use it for image indexing and comparison. This feature distills the spatial correlation of colors and when computed efficiently, turns out to be both effective and inexpensive for content-based image retrieval. The correlogram is robust in tolerating large changes in appearance and shape caused by changes in viewing position, camera zoom, etc. Experimental evidence shows that this new feature outperforms not only the traditional color histogram method but also the recently proposed histogram refinement methods for image indexing/retrieval. We also provide a technique to cut down the storage requirement of the correlogram so that it is the same as that of histograms, with only negligible performance penalty compared to the original correlogram. We also suggest the use of color correlogram as a generic indexing tool to tackle various problems arising from image retrieval and video browsing. We adapt the correlogram to handle the problems of image subregion querying, object localization, object tracking, and cut detection. Experimental results again suggest that the color correlogram is more effective than the histogram for these applications, with insignificant additional storage or processing cost.

337 citations


Journal ArticleDOI
TL;DR: This paper formulates and solves a new variant of the stereo correspondence problem: simultaneously recovering the disparities, true colors, and opacities of visible surface elements in a generalized 3D disparity space.
Abstract: This paper formulates and solves a new variant of the stereo correspondence problem: simultaneously recovering the disparities, true colors, and opacities of visible surface elements. This problem arises in newer applications of stereo reconstruction, such as view interpolation and the layering of real imagery with synthetic graphics for special effects and virtual studio applications. While this problem is intrinsically more difficult than traditional stereo correspondence, where only the disparities are being recovered, it provides a principled way of dealing with commonly occurring problems such as occlusions and the handling of mixed (foreground/background) pixels near depth discontinuities. It also provides a novel means for separating foreground and background objects (matting), without the use of a special blue screen. We formulate the problem as the recovery of colors and opacities in a generalized 3D (x, y, d) disparity space, and solve the problem using a combination of initial evidence aggregation followed by iterative energy minimization.

Journal ArticleDOI
TL;DR: It is proved that each topographic map represents a class of images invariant with respect to local contrast changes, where the subjacent occlusion-transparency structure is put into evidence by the interplay of level lines.
Abstract: We call “natural” image any photograph of an outdoor or indoor scene taken by a standard camera. We discuss the physical generation process of natural images as a combination of occlusions, transparencies and contrast changes. This description fits to the phenomenological description of Gaetano Kanizsa according to which visual perception tends to remain stable with respect to these basic operations. We define a contrast invariant presentation of the digital image, the topographic map, where the subjacent occlusion-transparency structure is put into evidence by the interplay of level lines. We prove that each topographic map represents a class of images invariant with respect to local contrast changes. Several visualization strategies of the topographic map are proposed and implemented and mathematical arguments are developed to establish stability properties of the topographic map under digitization.

Journal ArticleDOI
TL;DR: A method of learning generative models of objects from a set of images of the object under different, and unknown, illumination that allows us to approximate the objects' appearance under a range of lighting conditions is described.
Abstract: We describe a method of learning generative models of objects from a set of images of the object under different, and unknown, illumination. Such a model allows us to approximate the objects‘ appearance under a range of lighting conditions. This work is closely related to photometric stereo with unknown light sources and, in particular, to the use of Singular Value Decomposition (SVD) to estimate shape and albedo from multiple images up to a linear transformation (Hayakawa, 1994). Firstly we analyze and extend the SVD approach to this problem. We demonstrate that it applies to objects for which the dominant imaging effects are Lambertian reflectance with a distant light source and a background ambient term. To determine that this is a reasonable approximation we calculate the eigenvectors of the SVD on a set of real objects, under varying lighting conditions, and demonstrate that the first few eigenvectors account for most of the data in agreement with our predictions. We then analyze the linear ambiguities in the SVD approach and demonstrate that previous methods proposed to resolve them (Hayakawa, 1994) are only valid under certain conditions. We discuss alternative possibilities and, in particular, demonstrate that knowledge of the object class is sufficient to resolve this problem. Secondly, we describe the use of surface consistency for putting constraints on the possible solutions. We prove that this constraint reduces the ambiguities to a subspace called the generalized bas relief ambiguity (GBR) which is inherent in the Lambertian reflectance function (and which can be shown to exist even if attached and cast shadows are present (Belhumeur et al., 1997)). We demonstrate the use of surface consistency to solve for the shape and albedo up to a GBR and describe, and implement, a variety of additional assumptions to resolve the GBR. Thirdly, we demonstrate an iterative algorithm that can detect and remove some attached shadows from the objects thereby increasing the accuracy of the reconstructed shape and albedo.

Journal ArticleDOI
TL;DR: The aim of this work is the recovery of 3D structure and camera projection matrices for each frame of an uncalibrated image sequence, and investigates two strategies for tackling degeneracies, including a statistical model selection test to identify when degeneracies occur.
Abstract: The aim of this work is the recovery of 3D structure and camera projection matrices for each frame of an uncalibrated image sequence. In order to achieve this, correspondences are required throughout the sequence. A significant and successful mechanism for automatically establishing these correspondences is by the use of geometric constraints arising from scene rigidity. However, problems arise with such geometry guided matching if general viewpoint and general structure are assumed whilst frames in the sequence and/or scene structure do not conform to these assumptions. Such cases are termed degenerate. In this paper we describe two important cases of degeneracy and their effects on geometry guided matching. The cases are a motion degeneracy where the camera does not translate between frames, and a structure degeneracy where the viewed scene structure is planar. The effects include the loss of correspondences due to under or over fitting of geometric models estimated from image data, leading to the failure of the tracking method. These degeneracies are not a theoretical curiosity, but commonly occur in real sequences where models are statistically estimated from image points with measurement error. We investigate two strategies for tackling such degeneracies: the first uses a statistical model selection test to identify when degeneracies occur: the second uses multiple motion models to overcome the degeneracies. The strategies are evaluated on real sequences varying in motion, scene type, and length from 13 to 120 frames.

Journal ArticleDOI
TL;DR: It is argued that locally orderless images are ubiquitous in perception and the visual arts and how to construct and use them in a variety of local and global image processing operations is described.
Abstract: We propose a representation of images in which a global, but not a local topology is defined. The topology is restricted to resolutions up to the extent of the local region of interest (ROI). Although the ROI‘s may contain many pixels, there is no spatial order on the pixels within the ROI, the only information preserved is the histogram of pixel values within the ROI‘s. This can be considered as an extreme case of a textel (texture element) image: The histogram is the limit of texture where the spatial order has been completely disregarded. We argue that locally orderless images are ubiquitous in perception and the visual arts. Formally, the orderless images are most aptly described by three mutually intertwined scale spaces. The scale parameters correspond to the pixellation (“inner scale”), the extent of the ROI‘s (“outer scale”) and the resolution in the histogram (“tonal scale”). We describe how to construct locally orderless images, how to render them, and how to use them in a variety of local and global image processing operations.

Journal ArticleDOI
TL;DR: This paper demonstrates an automatic system for telling whether there are human nudes present in an image, which marks skin-like pixels using combined color and texture properties and feeds them to a specialized grouper, which attempts to group a human figure using geometric constraints on human structure.
Abstract: This paper demonstrates an automatic system for telling whether there are human nudes present in an image. The system marks skin-like pixels using combined color and texture properties. These skin regions are then fed to a specialized grouper, which attempts to group a human figure using geometric constraints on human structure. If the grouper finds a sufficiently complex structure, the system decides a human is present. The approach is shown to be effective for a wide range of shades and colors of skin and human configurations. This approach offers an alternate view of object recognition, where an object model is an organized collection of grouping hints obtained from a combination of constraints on color and texture and constraints on geometric properties such as the structure of individual parts and the relationships between parts. The system demonstrates excellent performance on a test set of 565 uncontrolled images of human nudes, mostly obtained from the internet, and 4289 assorted control images, drawn from a wide variety of sources.

Journal ArticleDOI
TL;DR: The results indicate that model-based tracking of rigid objects in monocular image sequences may have to be reappraised more thoroughly than anticipated during the recent past.
Abstract: A model-based vehicle tracking system for the evaluation of inner-city traffic video sequences has been systematically tested on about 15 minutes of real world video data Methodological improvements during preparatory test phases affected—among other changes—the combination of edge element and optical flow estimates in the measurement process and a more consequent exploitation of background knowledge The explication of this knowledge in the form of models facilitates the evaluation of video data for different scenes by exchanging the scene-dependent models An extensive series of experiments with a large test sample demonstrates that the current version of our system appears to have reached a relative optimum: further interactive tuning of tracking parameters does no longer promise to improve the overall system performance significantly Even the incorporation of further knowledge regarding vehicle and scene geometry or illumination has to cope with an increasing level of interaction between different knowledge sources and system parameters Our results indicate that model-based tracking of rigid objects in monocular image sequences may have to be reappraised more thoroughly than anticipated during the recent past

Journal ArticleDOI
TL;DR: A novel calibration method using 4 known non-coplanar sets of 3 collinear world points and with no prior knowledge of the perspective projection matrix of the camera is presented, showing that world points lying on each light stripe plane can be computed.
Abstract: The problem associated with calibrating a structured light stripe system is that known world points on the calibration target do not normally fall onto every light stripe plane illuminated from the projector. We present in this paper a novel calibration method that employs the invariance of the cross ratio to overcome this problem. Using 4 known non-coplanar sets of 3 collinear world points and with no prior knowledge of the perspective projection matrix of the camera, we show that world points lying on each light stripe plane can be computed. Furthermore, by incorporating the homography between the light stripe and image planes, the 4 × 3 image-to-world transformation matrix for each stripe plane can also be recovered. The experiments conducted suggest that this novel calibration method is robust, economical, and is applicable to many dense shape reconstruction tasks.

Journal ArticleDOI
James H. Elder1
TL;DR: A novel method for inverting the edge code to reconstruct a perceptually accurate estimate of the original image is reported, and thus it is demonstrated that the proposed representation embodies virtually all of the perceptually relevant information contained in a natural image.
Abstract: We address the problem of computing a general-purpose early visual representation that satisfies two criteria. 1) Explicitness: To be more useful than the original pixel array, the representation must take a significant step toward making important image structure explicit. 2) Completeness: To support a diverse set of high-level tasks, the representation must not discard information of potential perceptual relevance. The most prevalent representation in image processing and computer vision that satisfies the completeness criterion is the wavelet code. In this paper, we propose a very different code which represents the location of each edge and the magnitude and blur scale of the underlying intensity change. By making edge structure explicit, we argue that this representation better satisfies the first criterion than do wavelet codes. To address the second criterion, we study the question of how much visual information is lost in the representation. We report a novel method for inverting the edge code to reconstruct a perceptually accurate estimate of the original image, and thus demonstrate that the proposed representation embodies virtually all of the perceptually relevant information contained in a natural image. This result bears on recent claims that edge representations do not contain all of the information needed for higher level tasks.

Journal ArticleDOI
TL;DR: A new algorithm for solving the stereo correspondence problem with a global 2-d optimization by transforming it into a maximum-flow problem in a graph, which effectively removes explicit use of epipolar geometry, thus allowing direct use of multiple cameras with arbitrary geometries.
Abstract: This paper describes a new algorithm for solving the stereo correspondence problem with a global 2-d optimization by transforming it into a maximum-flow problem in a graph. This transformation effectively removes explicit use of epipolar geometry, thus allowing direct use of multiple cameras with arbitrary geometries. The maximum-flow, solved both efficiently and globally, yields a minimum-cut that corresponds to a disparity surface for the whole image at once. This global and efficient approach to stereo analysis allows the reconstruction to proceed in an arbitrary volume of space and provides a more accurate and coherent depth map than the traditional stereo algorithms. In particular, smoothness is applied uniformly instead of only along epipolar lines, while the global optimality of the depth surface is guaranteed. Results show improved depth estimation as well as better handling of depth discontinuities. While the worst case running time is O(n^1.5 d^1.5 log(nd)), the observed average running time is O(n^1.2 d^1.3) for an image size of n pixels and depth resolution d.

Journal ArticleDOI
TL;DR: The BRDF (Bidirectional Reflection Distribution Function) at the mega scale of opaque surfaces that are rough on the macro and micro scale is derived, which means one can do exact calculations for a surface geometry that is physically realizable.
Abstract: We derive the BRDF (Bidirectional Reflection Distribution Function) at the mega scale of opaque surfaces that are rough on the macro and micro scale. The roughness at the micro scale is modeled as a uniform, isotropically scattering, Lambertian surface. At the macro scale the roughness is modeled by way of a distribution of spherical concavities. These pits influence the BRDF via vignetting, cast shadow, interreflection and interposition, causing it to differ markedly from Lambertian. Pitted surfaces show strong backward scattering (so called “opposition effect”). When we assume that the macro scale can be resolved, the radiance histogram and the spatial structure of the textons of the textured surface (at the mega scale) can be calculated. This is the main advantage of the model over previous ones: One can do exact (numerical) calculations for a surface geometry that is physically realizable.

Journal ArticleDOI
TL;DR: A new measure of perceptual saliency is proposed and quantitatively compare its ability to detect natural shapes in cluttered backgrounds to five previously proposed measures and finds that the new measure significantly outperforms previous measures.
Abstract: We propose a new measure of perceptual saliency and quantitatively compare its ability to detect natural shapes in cluttered backgrounds to five previously proposed measures. As defined in the new measure, the saliency of an edge is the fraction of closed random walks which contain that edge. The transition-probability matrix defining the random walk between edges is based on a distribution of natural shapes modeled by a stochastic motion. Each of the saliency measures in our comparison is a function of a set of affinity values assigned to pairs of edges. Although the authors of each measure define the affinity between a pair of edges somewhat differently, all incorporate the Gestalt principles of good-continuation and proximity in some form. In order to make the comparison meaningful, we use a single definition of affinity and focus instead on the performance of the different functions for combining affinity values. The primary performance criterion is accuracy. We compute false-positive rates in classifying edges as signal or noise for a large set of test figures. In almost every case, the new measure significantly outperforms previous measures.

Journal ArticleDOI
TL;DR: A new approach to the registration of digital angiographic images is proposed that involves an edge-based selection of control points for which the displacement is computed by means of template matching, and from which the complete displacement vector field is constructed by Means of interpolation.
Abstract: In clinical practice, Digital Subtraction Angiography (DSA) is a powerful technique for the visualization of blood vessels in the human body. The diagnostic relevance of the images is often reduced by artifacts which arise from the misalignment of successive images in the sequence, due to patient motion. In order to improve the quality of the subtraction images, several registration techniques have been proposed. However, because of the required computation times, it has never led to algorithms that are fast enough so as to be acceptable for integration in clinical applications. In this paper, a new approach to the registration of digital angiographic images is proposed. It involves an edge-based selection of control points for which the displacement is computed by means of template matching, and from which the complete displacement vector field is constructed by means of interpolation. The final warping of the images according to the calculated displacement vector field is performed real-time by graphics hardware. Experimental results with several clinical data sets show that the proposed algorithm is both effective and very fast.

Journal ArticleDOI
TL;DR: In this paper, the epipolar geometry relating the current image taken by the robot and the target image is recovered, and most of the parameters which specify the differences in position and orientation of the camera between the two images are recovered However, since not all of these parameters can be recovered from two images, they have developed specific methods to bypass these missing parameters and resolve the ambiguities that exist.
Abstract: We introduce a novel method for visual homing Using this method a robot can be sent to desired positions and orientations in 3D space specified by single images taken from these positions Our method is based on recovering the epipolar geometry relating the current image taken by the robot and the target image Using the epipolar geometry, most of the parameters which specify the differences in position and orientation of the camera between the two images are recovered However, since not all of the parameters can be recovered from two images, we have developed specific methods to bypass these missing parameters and resolve the ambiguities that exist We present two homing algorithms for two standard projection models, weak and full perspective Our method determines the path of the robot on-line, the starting position of the robot is relatively not constrained, and a 3D model of the environment is not required The method is almost entirely memoryless, in the sense that at every step the path to the target position is determined independently of the previous path taken by the robot Because of this property the robot may be able, while moving toward the target, to perform auxiliary tasks or to avoid obstacles, without this impairing its ability to eventually reach the target position We have performed simulations and real experiments which demonstrate the robustness of the method and that the algorithms always converge to the target pose

Journal ArticleDOI
TL;DR: The Incremental Focus of Attention architecture for robust, adaptive, real-time motion tracking is presented and examples show that recovery times after lost tracking depend primarily on the number of objects visually similar to the target in the field of view.
Abstract: We present the Incremental Focus of Attention (IFA) architecture for robust, adaptive, real-time motion tracking. IFA systems combine several visual search and vision-based tracking algorithms into a layered hierarchy. The architecture controls the transitions between layers and executes algorithms appropriate to the visual environment at hand: When conditions are good, tracking is accurate and precises as conditions deteriorate, more robust, yet less accurate algorithms take overs when tracking is lost altogether, layers cooperate to perform a rapid search for the target and continue tracking. Implemented IFA systems are extremely robust to most common types of temporary visual disturbances. They resist minor visual perturbances and recover quickly after full occlusions, illumination changes, major distractions, and target disappearances. Analysis of the algorithm‘s recovery times are supported by simulation results and experiments on real data. In particular, examples show that recovery times after lost tracking depend primarily on the number of objects visually similar to the target in the field of view.

Journal ArticleDOI
TL;DR: A multiscale method to MRI brain segmentation is presented which uses both edge and intensity information and shows that both an improvement in accuracy and a reduction in image post-processing can be achieved if edge dependent diffusion is used instead of linear diffusion.
Abstract: Segmentation of MR brain images using intensity values is severely limited owing to field inhomogeneities, susceptibility artifacts and partial volume effects. Edge based segmentation methods suffer from spurious edges and gaps in boundaries. A multiscale method to MRI brain segmentation is presented which uses both edge and intensity information. First a multiscale representation of an image is created, which can be made edge dependent to favor intra-tissue diffusion over inter-tissue diffusion. Subsequently a multiscale linking model (the hyperstack) is used to group voxels into a number of objects based on intensity. It is shown that both an improvement in accuracy and a reduction in image post-processing can be achieved if edge dependent diffusion is used instead of linear diffusion. The combination of edge dependent diffusion and intensity based linking facilitates segmentation of grey matter, white matter and cerebrospinal fluid with minimal user interaction. To segment the total brain (white matter plus grey matter) morphological operations are applied to remove small bridges between the brain and cranium. If the total brain is segmented, grey matter, white matter and cerebrospinal fluid can be segmented by joining a small number of segments. Using a supervised segmentation technique and MRI simulations of a brain phantom for validation it is shown that the errors are in the order of or smaller than reported in literature.

Journal ArticleDOI
TL;DR: An optical flow estimation technique is presented which is based on the least-median-of-squares (LMedS) robust regression algorithm enabling more accurate flow estimates to be computed in the vicinity of motion discontinuities.
Abstract: An optical flow estimation technique is presented which is based on the least-median-of-squares (LMedS) robust regression algorithm enabling more accurate flow estimates to be computed in the vicinity of motion discontinuities. The flow is computed in a blockwise fashion using an affine model. Through the use of overlapping blocks coupled with a block shifting strategy, redundancy is introduced into the computation of the flow. This eliminates blocking effects common in most other techniques based on blockwise processing and also allows flow to be accurately computed in regions containing three distinct motions. A multiresolution version of the technique is also presented, again based on LMedS regression, which enables image sequences containing large motions to be effectively handled. An extensive set of quantitative comparisons with a wide range of previously published methods are carried out using synthetic, realistic (computer generated images of natural scenes with known flow) and natural images. Both angular and absolute flow errors are calculated for those sequences with known optical flow. Displaced frame difference error, used extensively in video compression, is used for those natural scenes with unknown flow. In all of the sequences tested, a comparison with those methods that result in a dense flow field (greater than 80% spatial coverage), show that the LMedS technique produces the least error irrespective of the error measure used.

Journal ArticleDOI
TL;DR: A new method of estimating volumetric deformation by integrating intrinsic instantaneous velocity data with geometrical token displacement information, based upon continuum mechanics principles is presented.
Abstract: Non-rigid motion estimation from image sequences is essential in analyzing and understanding the dynamic behavior of physical objects. One important example is the dense field motion analysis of the cardiac wall, which could potentially help to better understand the physiological processes associated with heart disease and to provide improvement in patient diagnosis and treatment. In this paper, we present a new method of estimating volumetric deformation by integrating intrinsic instantaneous velocity data with geometrical token displacement information, based upon continuum mechanics principles. This object-dependent approach allows the incorporation of physically meaningful constraints into the ill-posed motion recovery problem, and the integration of the two disparate but complementary data sources overcomes some of the limitations of the single-image-source-based motion estimation approaches.

Journal ArticleDOI
TL;DR: An algorithm is given, based on hierarchical subdivision of transformation space, which minimises the measure under the group of affine transformations, given two patterns.
Abstract: We present a new pattern similarity measure that behaves well under affine transformations. Our similarity measure is useful for pattern matching since it is defined on patterns with multiple components, satisfies the metric properties, is invariant under affine transformations, and is robust with respect to perturbation and occlusion. We give an algorithm, based on hierarchical subdivision of transformation space, which minimises our measure under the group of affine transformations, given two patterns. In addition, we present results obtained using an implementation of this algorithm.

Journal ArticleDOI
TL;DR: A simple and inexpensive approach for extracting the three-dimensional shape of objects is presented based on ‘weak structured lighting’, demonstrating that the error in reconstructing the surface is less than 0.5% of the size of the object.
Abstract: A simple and inexpensive approach for extracting the three-dimensional shape of objects is presented. It is based on ‘weak structured lighting’. It requires very little hardware besides the camera: a light source (a desk-lamp or the sun), a stick and a checkerboard. The object, illuminated by the light source, is placed on a stage composed of a ground plane and a back planes the camera faces the object. The user moves the stick in front of the light source, casting a moving shadow on the scene. The 3D shape of the object is extracted from the spatial and temporal location of the observed shadow. Experimental results are presented on five different scenes (indoor with a desk lamp and outdoor with the sun) demonstrating that the error in reconstructing the surface is less than 0.5% of the size of the object. A mathematical formalism is proposed that simplifies the notation and keep the algebra compact. A real-time implementation of the system is also presented.