scispace - formally typeset
Search or ask a question

Showing papers presented at "British Machine Vision Conference in 1993"


Proceedings ArticleDOI
01 Jan 1993
TL;DR: A model-based approach which allows robust and accurate interpretation using explicit anatomical knowledge is described, based on the extension to 3D of Point Distribution Models (PDMs) and associated image search algorithms.
Abstract: The automatic segmentation and labelling of anatomical structures in 3D medical images is a challenging task of practical importance. We describe a model-based approach which allows robust and accurate interpretation using explicit anatomical knowledge. Our method is based on the extension to 3D of Point Distribution Models (PDMs) and associated image search algorithms. A combination of global, Genetic Algorithm (GA), and local, Active Shape Model (ASM), search is used. We have built a 3D PDM of the human brain describing a number of major structures. Using this model we have obtained automatic interpretations for 30 3D Magnetic Resonance head images from different individuals. The results have been evaluated quantitatively and support our claim of robust and accurate interpretation.

132 citations


Proceedings ArticleDOI
01 Jan 1993
TL;DR: The results demonstrate that the addition of grey-level models leads to considerable improvement over earlier schemes, leading to improved reliability and accuracy.
Abstract: We describe methods for locating known structures in images. We have previously described statistical models of shape and shape variability which can be used for this purpose (Active Shape Models). In this paper we show how statistical models of grey-level appearance can be incorporated, leading to improved reliability and accuracy. We describe experiments designed to (i) test how well an ASM can locate an object in a new image, (ii) to assess the effects on performance of varying the model parameters, and (iii) to compare the results using grey-level models with those using a search for strongest edges. The results demonstrate that the addition of grey-level models leads to considerable improvement over earlier schemes.

122 citations


Proceedings ArticleDOI
01 Jan 1993
TL;DR: In this paper, a system that combines stereo vision with a 5DOF robotic manipulator to enable it to locate and reach for objects in an unstructured environment is described.
Abstract: This paper describes a system that combines stereo vision with a 5-DOF robotic manipulator, to enable it to locate and reach for objects in an unstructured environment. Our system uses an affine stereo algorithm, a simple but robust approximation to the geometry of stereo vision, to estimate positions and surface orientations. It can be calibrated very easily with just four reference points. These are defined by the robot itself, moving the gripper to four known positions ( self-calibration ). The inevitable small errors are corrected by a feedback mechanism which implements image-based control of the gripper's position and orientation. Integral to this feedback mechanism is the use of affine active contour models which track the real-time motion of the gripper across the two images. Experiments show the system to be remarkably immune to unexpected translations and rotations of the cameras and changes of focal length — even after it has ‘calibrated’ itself.

64 citations


Journal ArticleDOI
01 May 1993
TL;DR: This paper describes the use of a low level, computationally inexpensive closed form motion detector to define regions of interest within an image, based upon statistical measures, which requires only the first order properties of the image intensities.
Abstract: This paper describes the use of a low level, computationally inexpensive closed form motion detector to define regions of interest within an image, based upon statistical measures. The algorithm requires only the first order properties of the image intensities and does not require known camera motion. It has been tested on a variety of real imagery. A b-spline snake is initialised on the occluding contours of this region of interest.

63 citations


Proceedings ArticleDOI
01 Jan 1993
TL;DR: It is shown that the geometric histograms used to record these distributions can be easily and robustly acquired from image data and can support recognition even when the shape extracted from the image is badly degraded by fragmentation noise and occlusion.
Abstract: We introduce a novel form of shape representation based on recording the distribution of pairwise geometric relationships between local shape features. It is shown that the geometric histograms used to record these distributions can be easily and robustly acquired from image data and can support recognition even when the shape extracted from the image is badly degraded by fragmentation noise and occlusion. Moreover, the processing involved in establishing correspondences between model and image features is both simple and parallel and has many advantages over previous search based methods.

47 citations


Proceedings ArticleDOI
01 Jan 1993
TL;DR: This paper illustrates how these hybrid models can be used to extract facial bands and automatically segment a face image into meaningful regions, showing the benefits of simultaneous use of statistical and structural information.
Abstract: This paper details work done on face processing using a novel approach involving Hidden Markov Models. Experimental results from earlier work [14] indicated that left-to-right models with use of structural information yield better feature extraction than ergodic models. This paper illustrates how these hybrid models can be used to extract facial bands and automatically segment a face image into meaningful regions, showing the benefits of simultaneous use of statistical and structural information. It is shown how the segmented data can be used to identify different subjects. Successful segmentation and identification of face images was obtained, even when facial details (with/without glasses, smiling/non-smiling, open/closed eyes) were varied. Some experiments with a simple left-to-right model are presented to support the plausibility of this approach. Finally, present and future directions of research work using these models are indicated.

41 citations


Journal ArticleDOI
01 May 1993
TL;DR: This new approach allows VPs to be identified in less structured environments compared with its conventional counterparts and provides probability measures which reflect the likelihood of those points being the VPs.
Abstract: Commencing with a review of methods for vanishing point (VP) detection, a new approach is suggested. The proposed approach estimates the location of candidate vanishing points and provides probability measures which reflect the likelihood of those points being the VPs. This new approach allows VPs to be identified in less structured environments compared with its conventional counterparts.

36 citations


Proceedings ArticleDOI
01 Jan 1993
TL;DR: The results demonstrate that the technique reduces the mean and variance of the background level in the accumulator array, that the peak to background level improves, and that thepeak width is reduced improving localisation of circle centres.
Abstract: We introduce a novel formulation of the Circle Hough Transform that we call the Coherent Circle Hough Transform. The technique uses phase to code for radii of circles. The usual simplifications of the Circle Hough Transform (CHT) are used in which lines pointing away from edge points are plotted rather than circles. Intersections of these "spokes" accumulate edge magnitude, or edge "energy", near the centres of circles. We introduce the use of a complex accumulator space and allow each spoke to vary in phase along its length. The spokes are in phase near the centre of circles and out of phase elsewhere. The spokes generated by noise are in random phase and destructively interfere. We present results for an isolated circle with additive white Gaussian noise for both the conventional Energy CHT and the new Coherent CHT. The results demonstrate that the technique reduces the mean and variance of the background level in the accumulator array, that the peak to background level improves, and that the peak width is reduced improving localisation of circle centres.

35 citations


Proceedings ArticleDOI
01 Jan 1993
TL;DR: A new method is presented that permits to solve the problem of determination of a modelled 3D-object spatial attitude from a single perspective image and to compute the covariance matrix associated to the attitude parameters.
Abstract: This paper presents a new method that permits to solve the problem of determination of a modelled 3D-object spatial attitude from a single perspective image and to compute the covariance matrix associated to the attitude parameters. Its principle is based on the interpretation of at least three segments as the perspective projection of linear ridges of the object model and on the iterative search ( using Kalman filtering) of the model attitude consistent with these projections. The knowledge of the attitude and of the associated covariances enables to use a higher level Kalman filter to track an object along an image sequence. In the tracking process this Kalman filter is used to predict the attitude of the object and the error matrices are used to make robust automatic matches between the image segments and the model ridges. Tracking experiments have been made that proves the validity of this approach. This work has been partially supported by a contract with the European Spatial Agency (ESA) in which society Sagem is the prime contractor.

33 citations


Proceedings ArticleDOI
01 Jan 1993
TL;DR: A technique for classifying variable objects using flexible template models is described and is recognised as the input, plant seeds, handprinted characters and human faces.
Abstract: A technique for classifying variable objects using flexible template models is described. : is recognised as the input, plant seeds, handprinted characters and human faces; quantitative results are presented.

31 citations


Proceedings ArticleDOI
01 Jan 1993
TL;DR: Recent developments which have extended the scope of the model based vision system to include (i) multiple cameras, (ii) variable camera geometry, and (Hi) articulated objects are presented.
Abstract: Model based vision allows use of prior knowledge of the shape and appearance of specific objects to be used in the interpretation of a visual scene; it provides a powerful and natural way to enforce the view consistency constraint [I]. A model based vision system has been developed within ESPRIT VIEWS: P2152 which is able to classify and track moving objects (cars and other vehicles) in complex, cluttered traffic scenes. The fundamental basis of the method has been previously reported [2]. This paper presents recent developments which have extended the scope of the system to include (i) multiple cameras, (ii) variable camera geometry, and (Hi) articulated objects. All three enhancements have easily been accommodated within the original model-based approach. 1 Review of methods The models used consist of 3D geometrical representations of known objects (vehicles) together with calibrated camera and scene models [3]. Using the known camera and scene geometry, and given a provisional position and orientation (derived from data-driven detection of temporal change in the image), a 3D object can be instantiated into the 2D image plane and a "goodness-of-fit" score obtained by comparing the modelled features with the image. An iterative search in position-space and orientation-space is then used to maximize this evaluation score. At each step in the search the model is re-instantiated into the scene and a new goodness-of-fit score evaluated.

Proceedings ArticleDOI
01 Jan 1993
TL;DR: The algorithm is developed and experiments are performed, proving its superior performance in terms of estimate accuracy, convergence, robustness and better segmentation.
Abstract: An application of robust statistics in a Hough transform based motion estimation approach is presented. The algorithm is developed and experiments are performed, proving its superior performance in terms of estimate accuracy, convergence, robustness and better segmentation. Comparative results with standard methods are also included.

Proceedings ArticleDOI
01 Jan 1993
TL;DR: A real-time active surveillance vision system which detects moving objects in an everyday environment, directs the gaze of a head/eye platform towards the objects and subsequently pursues them smoothly, which can continue over extended periods.
Abstract: We describe the implementation of, and results from, a realtime active surveillance vision system which detects moving objects in an everyday environment, directs the gaze of a headkye platform towards the objects and subsequently pursues them smoothly. Target detection and pursuit arc performed purely on the basis of image motion, and can continue over extended periods. Two independent parallel processes derive (i) coarse resolution motion over the entire image to direct saccadic shifts in attention over a wide field of view, and (ii) fine resolution motion in a small central region of the image used to perform smooth-pursuit. A gaze controller which selects results from the two visual processes and controls the movement of the head platform is implemented as a finite state machine.

Journal ArticleDOI
01 May 1993
TL;DR: A novel approach to junction detection using an explicit line finder model and contextual rules is presented and the most promising method, the polyhedral object face recovery, is briefly discussed.
Abstract: A novel approach to junction detection using an explicit line finder model and contextual rules is presented. Contextual rules expressing properties of 3D-images (surface orientation discontinuities) limit the number of line intersections interpreted as junctions. A probabilistic relaxation labelling scheme is used to combine the a priori world knowledge represented by contextual rules and the information contained in observed lines. Junctions corresponding to a vertex (V-junctions) and an occlusion (T-junction) of 3D objects are detected and stored in a junciton graph. The information in the junction graph is used to extract higher level features. Results of the most promising method, the polyhedral object face recovery, are briefly discussed. The performance of the junction detection process is demonstrated on images from indoor, outdoor, and industrial environments.

Proceedings ArticleDOI
01 Jan 1993
TL;DR: A novel algorithm is presented which makes effective use of the ground-plane constraint to derive pose estimates, and a form of the generalised Hough transform is used to group evidence from line features, and to identify approximate poses.
Abstract: Objects such as vehicles are often constrained to lie on a known plane. The ground-plane constraint reduces the problem of localisation and recognition from 6 dofto 3 dof. A novel algorithm is presented which makes effective use of the ground-plane constraint to derive pose estimates. A form of the generalised Hough transform is used to group evidence from line features, and to identify approximate poses. The single orientation parameter is decoupled from the two location parameters, and treated separately. The method is fast and robust. It copes well with complex outdoor scenes including multiple occluded objects, and image clutter from irrelevant structures.

Journal ArticleDOI
01 May 1993
TL;DR: Two novel algorithms are presented for depth estimation using point correspondences and the ground plane constraint using a direct non iterative method and a simple well-behaved iterative technique where the choice of initial value is straightforward.
Abstract: Two novel algorithms are presented in this paper for depth estimation using point correspondences and the ground plane constraint. One is a direct non iterative method, and the other a simple well-behaved iterative technique where the choice of initial value is straightforward. The algorithms are capable of handling any number of points and frames as well as points which become occluded. Once the point depths are determined, motion parameters can be obtained by a linear least squares technique. Extensive test results are included which show that the proposed algorithms are robust to noise, and perform satisfactorily using real outdoor image sequences.

Proceedings ArticleDOI
01 Jan 1993
TL;DR: Experimental results demonstrate that Faugeras and Toscani's approach has no advantage over the conventional approach from the practical point of view, and the latter is shown to be superior both in noise robustness and in computational cost.
Abstract: Camera calibration often entails the computation of the perspective transformation matrix. Conventionally, the matrix has been calculated by the standard linear least squares technique. Recently, Faugeras and Toscani have criticised the conventional approach for producing unsatisfactory, even "absurd", solutions, and have proposed an alternative approach. It is shown in this paper that their criticism of the conventional approach is misplaced and misleading. Experimental results demonstrate that Faugeras and Toscani's approach has no advantage over the conventional approach from the practical point of view. In fact, the latter is shown to be superior both in noise robustness and in computational cost. The paper also reports a method to resolve the possible sign ambiguities in the camera parameters computed by existing algorithms.

Proceedings ArticleDOI
01 Jan 1993
TL;DR: A new method that permits to estimate, in the viewer coordinate system, the spatial attitude of an articulated object from a single perspective image based on the interpretation of some image lines as the perspective projection of linear ridges of the object model, and on an iterative search of the model attitude consistent with these projections.
Abstract: This paper presents a new method that permits to estimate, in the viewer coordinate system, the spatial attitude of an articulated object from a single perspective image. Its principle is based on the interpretation of some image lines as the perspective projection of linear ridges of the object model, and on an iterative search of the model attitude consistent with these projections. The presented method doesn't locate separately the different parts of the object by using for each of them a technics devoted to the localization of rigid object but computes a global attitude which respects the mechanical articulations of the objet. In fact, the geometrical transformations applied to the model to bring it into the correct attitude are obtained in two steps. The first one is devoted to the estimation of the attitude parameters corresponding to a rotation and involves an iterative process. The second step permits by the resolution of a linear system to estimate the translation parameters. The presented experiments correspond to the localization of robot arms from synthetical and real images. The former case permits to appreciate the accuracy of the method since the final result of the pose estimation can be compared with the attitude parameters used to create the synthetical image. The latter case presents an experiment made in an industrial environment and involves the estimation of twelve paramaters since the observed robot arm owns six inner degrees of freedom. The presented method can be useful in some operation driven by remote control.

Proceedings ArticleDOI
21 Sep 1993
TL;DR: It is shown in this paper that afTine shape with respect to 4 reference points can be obtained from two perspective images provided that the pair of images is affinely calibrated.
Abstract: It has been shown that relative projective shape, determined up to an unknown projective transformation, with respect to 5 reference points can be obtained from point-to-point correspondences of a pair of images; Affine shape up to an unknown affine transformation with respect to 4 points can be obtained from parallel projection. We show in this paper that afTine shape with respect to 4 reference points can be obtained from two perspective images provided that the pair of images is affinely calibrated. By affine calibration, it means the establishment of a special plane collineation between two image planes, this collineation is the product of two plane collineations each of which establishes a (1,1) correspondence between an image plane and the plane at infinity. Experimental results are also presented.

Proceedings ArticleDOI
01 Jan 1993
TL;DR: It is shown that it is possible to establish the point correspondences uniquely in the sense that they yield a unique affine structure of the object and that, the computation is possible in polynomial time.
Abstract: In this paper the problem of computing the point correspondences in a sequence of time-varying images of a 3D object undergoing nonrigid (affine) motion is addressed. It is assumed that the images are obtained through affine projections. The correspondences are established only from the analysis of the unknown 3D affine structure of the object, without making use of any attributes of the feature points. It is shown that it is possible to establish the point correspondences uniquely (up to symmetry) in the sense that they yield a unique affine structure of the object and that the computation is possible in polynomial time. Two different algorithms for computing the point correspondences are presented. Results on various real image sequences, including a sequence containing independently moving objects, demonstrate the applicability of the structure based approach to motion correspondence.

Proceedings ArticleDOI
01 Jan 1993
TL;DR: A parallel implementation of a texture segmentation algorithm that uses a Pearl Bayes Network to combine evidence for the location of urban regions in airborne infra-red linescan images and for the locations of driveable regions in autonomous land vehicle imagery is described.
Abstract: This paper describes a parallel implementation of a texture segmentation algorithm The algorithm uses a Pearl Bayes Network (PBN) to combine evidence for the location of urban regions in airborne infra-red linescan images and for the location of driveable regions in autonomous land vehicle imagery A multilevel PBN approach is introduced and followed by an example which is used to illustrate the derivation of the propagation and fusion equations The parallel implementation is then described with results demonstrating its effectiveness

Journal ArticleDOI
01 May 1993
TL;DR: To reduce the number of hypotheses generated from scene-model vertex assignments, and recover the complete object pose, this paper proposes a composite feature, namely vertex-CS feature by combining a trihedral vertex and a V-junction which share a common edge.
Abstract: This paper considers the problem of recognising 3D polyhedral objects from a single perspective image. A hypothesis-verification paradigm based on a local shape representation is presented. In the framework, 2D vertices interpreted as the projection of a trihedral vertex which is a 3D spatial vertex with three line emanating from the tip are employed as seed features for model invocation and hypothesis generation. To simplify the perspective analysis, Kanatani [7] has proposed an intuitive and elegant technique. Using the technique, we derive a fourth-degree polynomial for interpreting a trihedral vertex. The contribution of our solution is that there are no restrictions on angles between the vertex edges. To reduce the number of hypotheses generated from scene-model vertex assignments, and recover the complete object pose, we propose a composite feature, namely vertex-CS feature by combining a trihedral vertex and a V-junction which share a common edge. The geometric constraint of this composite feature is derived. A matching strategy used in the recognition system is discussed. The feasibility of the proposed method is illustrated on real data.

Proceedings ArticleDOI
01 Jan 1993
TL;DR: Geon theory as mentioned in this paper offers an account of this phenomenon characterized by four general assumptions: a) Objects are represented as an arrangement of simple convex or singly concave parts (geons), b) the geons can be distinguished by binary contrasts (differences) in viewpoint invariant properties, such as straight vs. curved, rather than metric properties such as degree of curvature, and c) the relations among geons are explicit, such a PERPENDICULAR-TO or TOP-OF, as part of a structural description, instead than implicit in a coordinate
Abstract: In a fraction of a second humans are able to comprehend novel images of objects and scenes. Indeed, the human represents the only existence proof that a general shape recognizer is even possible. Geon theory offers an account of this phenomenon characterized by four general assumptions: a) Objects are represented as an arrangement of simple convex or singly concave parts (geons), b) The geons can be distinguished by binary contrasts (differences) in viewpoint invariant properties, such as straight vs. curved, rather than metric properties such as degree of curvature, c) The relations among the geons are explicit, such as PERPENDICULAR-TO or TOP-OF, as part of a structural description, rather than implicit in a coordinate space, and d) A relatively small number of geons is sufficient. Recent research evaluating these assumptions is reviewed.

Proceedings ArticleDOI
23 Sep 1993
TL;DR: This novel algorithm is based on motion parallax, but uses sparse visual motion estimates to extract the direction of translation of the camera directly, after which determination of thecamera rotation and the depths of the image features follows easily.
Abstract: Determining the motion of a camera from its image sequences has so far proved very difficult, and no practical algorithms have been found for freely moving cameras. This novel algorithm is based on motion parallax, but uses sparse visual motion estimates to extract the direction of translation of the camera directly, after which determination of the camera rotation and the depths of the image features follows easily. This method can also detect and reject independent motion, and provide a measure of the uncertainty of its estimates.

Journal ArticleDOI
01 May 1993
TL;DR: This paper presents a means of segmenting planar regions from two views of a scene using point correspondences, and a novel motion direction estimator suggests itself.
Abstract: This paper presents a means of segmenting planar regions from two views of a scene using point correspondences. The initial selection of groups of coplanar points is performed on the basis of conservation of two five point projective invariants (groups for which this invariant is conserved are assumed to be coplanar). The correspondences for four of the five points are used to define a projectivity which is used to predict the change in position of other points assuming they lie on the same plane as the original four. A distance threshold between actual and predicted position is used to find extended planar regions. If two distinct planar regions can be found then a novel motion direction estimator suggests itself.

Proceedings ArticleDOI
23 Sep 1993
TL;DR: The power of 3D affine invariants in an object recognition scheme is evaluated and some of the noise problems arising due to the weak perspective approximation and corner localisation errors are discussed.
Abstract: We evaluate the power of 3D affine invariants in an object recognition scheme. These invariants are actively calculated by the real-time tracking of 2D image features (corners) over an image sequence. This is done optimally by using a Kalman filter. Object information is located in a hash table where it is stored and retrieved using the invariants as stable indices. Recognition takes place when significant evidence for a particular shape has been found from the table. Preliminary results with real data are presented, and some of the noise problems arising due to the weak perspective approximation and corner localisation errors are discussed.

Proceedings ArticleDOI
01 Jan 1993
TL;DR: In this paper, the structure of a 3D point set with a single bilateral symmetry was reconstructed from an uncalibrated affine image, modulo a Euclidean transformation, up to a four parameter family of symmetric objects that could have given rise to the image.
Abstract: We demonstrate that the structure of a 3D point set with a single bilateral symmetry can be reconstructed from an uncalibrated affine image, modulo a Euclidean transformation, up to a four parameter family of symmetric objects that could have given rise to the image. If the object has two orthogonal bilateral symmetries, its shape can be reconstructed, modulo a Euclidean transformation, to a three parameter family of symmetric shapes that could have given rise to the image. Furthermore, if the camera aspects ratio is known, the three parameter family reduces to a single scale and the orientation of the object can be determined. These results are demonstrated using real images with uncalibrated cameras.

Proceedings ArticleDOI
01 Jan 1993
TL;DR: In this paper, a method for active fixation in the context of recognition of man-made objects characterized by their shapes is proposed. But it is based on a grouping strategy, which forms sets of connected junctions separated from the surrounding at depth discontinuities.
Abstract: It is well-known that active selection of fixation points in humans is highly context and task dependent. It is therefore likely that successful computational processes for fixation in active vision should be so too. We are considering active fixation in the context of recognition of man-made objects characterized by their shapes. In this situation the qualitative shape and type of observed junctions play an important role. The fixations are driven by a grouping strategy, which forms sets of connected junctions separated from the surrounding at depth discontinuities. We have furthermore developed a methodology for rapid active detection and classification of junctions by selection of fixation points. The approach is based on direct computations from image data and allows integration of stereo and accommodation cues with luminance information. This work form a part of an effort to perform active recognition of generic objects, in the spirit of Malik and Biederman, but on real imagery rather than on line-drawings.

Proceedings ArticleDOI
01 Jan 1993
TL;DR: A natural generalisation of the usual accumulator method which incorporates statistical hypothesis testing to account for the effects of noise and errors in line segment parameters and develops an optimisation scheme as a post-process to remove sampling errors in the vanishing point accumulator.
Abstract: In this paper we use line segments from a Hough transform algorithm to locate vanishing points in an image. The line parameters have already been determined to high accuracy, and the purpose of this paper is to present a scheme for locating the vanishing points from the line intersections which takes full advantage of this accuracy. We present a natural generalisation of the usual accumulator method which incorporates statistical hypothesis testing to account for the effects of noise and errors in line segment parameters. Using this smooth voting kernel in the accumulation process, we have developed an optimisation scheme as a post-process to remove sampling errors in the vanishing point accumulator. We demonstrate the improvement in the results using synthetic imagery for which ground truth is known. We then demonstrate the algorithm on two images of outdoor scenes. The first is a road scene for which we determine vanishing points for a building in the street, and the second is an infra-red image of a runway as seen from an approaching aircraft.

Proceedings ArticleDOI
01 Jan 1993
TL;DR: This paper presents a model-based method for colour-based recognition of objects from a large database based on the assumption that surface reflectances of objects in the model database follow the extended dichromatic model proposed by Shafer.
Abstract: This paper presents a model-based method for colour-based recognition of objects from a large database. The algorithm is based on the assumption that surface reflectances of objects in the model database follow the extended dichromatic model proposed by Shafer [Sha84]. Adoption of the dichromatic model allows recovery of body colour - the component, of sensor responses (RGB-values) that is independent of scene geometry and illumination intensity. Both theoretical studies [Hea89b] and experiments [LB90][KSK88] confirm that Shafer's model gives a suitable approximation for reflectances of a wide range of materials. Instead of using traditional techniques (eg. clustering, split-and-merge) to obtain regions of 'similarly' coloured pixels followed by classification a novel approach is argued for. First, for each pixel a list of models with nonzero aposteriori probabilities P(modeU\body colotir) is computed using Bayes formula. Next, regions are formed by grouping pixels with identical most probable hypothesis. Probabilities P(modeli\region) are obtained trough a standard group decision rule [FT80]. We show that the proposed scheme can be used for a number of visual tasks - localization of objects, generation and verification of object hypotheses. Experiments on images of complex indoor scenes confirm that the proposed method can provide reliable information about the surrounding environment.