# Showing papers in "International Journal of Computer Vision in 1998"

••

TL;DR: The Condensation algorithm uses “factored sampling”, previously applied to the interpretation of static images, in which the probability distribution of possible interpretations is represented by a randomly generated set.

Abstract: The problem of tracking curves in dense visual clutter is challenging. Kalman filtering is inadequate because it is based on Gaussian densities which, being unimo dal, cannot represent simultaneous alternative hypotheses. The Condensation algorithm uses “factored sampling”, previously applied to the interpretation of static images, in which the probability distribution of possible interpretations is represented by a randomly generated set. Condensation uses learned dynamical models, together with visual observations, to propagate the random set over time. The result is highly robust tracking of agile motion. Notwithstanding the use of stochastic methods, the algorithm runs in near real-time.

5,804 citations

••

TL;DR: It is shown how the proposed methodology applies to the problems of blob detection, junction detection, edge detection, ridge detection and local frequency estimation and how it can be used as a major mechanism in algorithms for automatic scale selection, which adapt the local scales of processing to the local image structure.

Abstract: The fact that objects in the world appear in different ways depending on the scale of observation has important implications if one aims at describing them. It shows that the notion of scale is of utmost importance when processing unknown measurement data by automatic methods. In their seminal works, Witkin (1983) and Koenderink (1984) proposed to approach this problem by representing image structures at different scales in a so-called scale-space representation. Traditional scale-space theory building on this work, however, does not address the problem of how to select local appropriate scales for further analysis. This article proposes a systematic methodology for dealing with this problem. A framework is presented for generating hypotheses about interesting scale levels in image data, based on a general principle stating that local extrema over scales of different combinations of γ-normalized derivatives are likely candidates to correspond to interesting structures. Specifically, it is shown how this idea can be used as a major mechanism in algorithms for automatic scale selection, which adapt the local scales of processing to the local image structure.
Support for the proposed approach is given in terms of a general theoretical investigation of the behaviour of the scale selection method under rescalings of the input pattern and by integration with different types of early visual modules, including experiments on real-world and synthetic data. Support is also given by a detailed analysis of how different types of feature detectors perform when integrated with a scale selection mechanism and then applied to characteristic model patterns. Specifically, it is described in detail how the proposed methodology applies to the problems of blob detection, junction detection, edge detection, ridge detection and local frequency estimation.
In many computer vision applications, the poor performance of the low-level vision modules constitutes a major bottleneck. It is argued that the inclusion of mechanisms for automatic scale selection is essential if we are to construct vision systems to automatically analyse complex unknown environments.

2,942 citations

••

TL;DR: A “subspace constancy assumption” is defined that allows techniques for parameterized optical flow estimation to simultaneously solve for the view of an object and the affine transformation between the eigenspace and the image.

Abstract: This paper describes an approach for tracking rigid and articulated objects using a view-based representation. The approach builds on and extends work on eigenspace representations, robust estimation techniques, and parameterized optical flow estimation. First, we note that the least-squares image reconstruction of standard eigenspace techniques has a number of problems and we reformulate the reconstruction problem as one of robust estimation. Second we define a “subspace constancy assumption” that allows us to exploit techniques for parameterized optical flow estimation to simultaneously solve for the view of an object and the affine transformation between the eigenspace and the image. To account for large affine transformations between the eigenspace and the image we define a multi-scale eigenspace representation and a coarse-to-fine matching strategy. Finally, we use these techniques to track objects over long image sequences in which the objects simultaneously undergo both affine image motions and changes of view. In particular we use this “EigenTracking” technique to track and recognize the gestures of a moving hand.

1,343 citations

••

TL;DR: A complete review of the current techniques for estimating the fundamental matrix and its uncertainty is provided, and a well-founded measure is proposed to compare these techniques.

Abstract: Two images of a single scene/object are related by the epipolar geometry, which can be described by a 3×3 singular matrix called the essential matrix if images‘ internal parameters are known, or the fundamental matrix otherwise. It captures all geometric information contained in two images, and its determination is very important in many applications such as scene modeling and vehicle navigation. This paper gives an introduction to the epipolar geometry, and provides a complete review of the current techniques for estimating the fundamental matrix and its uncertainty. A well-founded measure is proposed to compare these techniques. Projective reconstruction is also reviewed. The software which we have developed for this review is available on the Internet.

1,217 citations

••

TL;DR: A mechanism is presented for automatic selection of scale levels when detecting one-dimensional image features, such as edges and ridges, with characteristic property that the selected scales on a scale-space ridge instead reflect the width of the ridge.

Abstract: When computing descriptors of image data, the type of information that can be extracted may be strongly dependent on the scales at which the image operators are applied. This article presents a systematic methodology for addressing this problem. A mechanism is presented for automatic selection of scale levels when detecting one-dimensional image features, such as edges and ridges.
A novel concept of a scale-space edge is introduced, defined as a connected set of points in scale-space at which: (i) the gradient magnitude assumes a local maximum in the gradient direction, and (ii) a normalized measure of the strength of the edge response is locally maximal over scales. An important consequence of this definition is that it allows the scale levels to vary along the edge. Two specific measures of edge strength are analyzed in detail, the gradient magnitude and a differential expression derived from the third-order derivative in the gradient direction. For a certain way of normalizing these differential descriptors, by expressing them in terms of so-called γ-normalized derivatives, an immediate consequence of this definition is that the edge detector will adapt its scale levels to the local image structure. Specifically, sharp edges will be detected at fine scales so as to reduce the shape distortions due to scale-space smoothing, whereas sufficiently coarse scales will be selected at diffuse edges, such that an edge model is a valid abstraction of the intensity profile across the edge.
Since the scale-space edge is defined from the intersection of two zero-crossing surfaces in scale-space, the edges will by definition form closed curves. This simplifies selection of salient edges, and a novel significance measure is proposed, by integrating the edge strength along the edge. Moreover, the scale information associated with each edge provides useful clues to the physical nature of the edge.
With just slight modifications, similar ideas can be used for formulating ridge detectors with automatic selection, having the characteristic property that the selected scales on a scale-space ridge instead reflect the width of the ridge.
It is shown how the methodology can be implemented in terms of straightforward visual front-end operations, and the validity of the approach is supported by theoretical analysis as well as experiments on real-world and synthetic data.

1,021 citations

••

TL;DR: A new method for separating and recovering the motion and shape of multiple independently moving objects in a sequence of images by introducing a mathematical construct of object shapes, called the shape interaction matrix, which is invariant to both the object motions and the selection of coordinate systems.

Abstract: The structure-from-motion problem has been extensively studied in the field of computer vision. Yet, the bulk of the existing work assumes that the scene contains only a single moving object. The more realistic case where an unknown number of objects move in the scene has received little attention, especially for its theoretical treatment. In this paper we present a new method for separating and recovering the motion and shape of multiple independently moving objects in a sequence of images. The method does not require prior knowledge of the number of objects, nor is dependent on any grouping of features into an object at the image level. For this purpose, we introduce a mathematical construct of object shapes, called the shape interaction matrix, which is invariant to both the object motions and the selection of coordinate systems. This invariant structure is computable solely from the observed trajectories of image features without grouping them into individual objects. Once the matrix is computed, it allows for segmenting features into objects by the process of transforming it into a canonical form, as well as recovering the shape and motion of each object. The theory works under a broad set of projection models (scaled orthography, paraperspective and affine) but they must be linear, so it excludes projective “cameras”.

778 citations

••

TL;DR: The resulting model, called FRAME (Filters, Random fields And Maximum Entropy), is a Markov random field (MRF) model, but with a much enriched vocabulary and hence much stronger descriptive ability than the previous MRF models used for texture modeling.

Abstract: This article presents a statistical theory for texture modeling. This theory combines filtering theory and Markov random field modeling through the maximum entropy principle, and interprets and clarifies many previous concepts and methods for texture analysis and synthesis from a unified point of view. Our theory characterizes the ensemble of images I with the same texture appearance by a probability distribution f(I) on a random field, and the objective of texture modeling is to make inference about f(I), given a set of observed texture examples.In our theory, texture modeling consists of two steps. (1) A set of filters is selected from a general filter bank to capture features of the texture, these filters are applied to observed texture images, and the histograms of the filtered images are extracted. These histograms are estimates of the marginal distributions of f( I). This step is called feature extraction. (2) The maximum entropy principle is employed to derive a distribution p(I), which is restricted to have the same marginal distributions as those in (1). This p(I) is considered as an estimate of f( I). This step is called feature fusion. A stepwise algorithm is proposed to choose filters from a general filter bank. The resulting model, called FRAME (Filters, Random fields And Maximum Entropy), is a Markov random field (MRF) model, but with a much enriched vocabulary and hence much stronger descriptive ability than the previous MRF models used for texture modeling. Gibbs sampler is adopted to synthesize texture images by drawing typical samples from p(I), thus the model is verified by seeing whether the synthesized texture images have similar visual appearances to the texture images being modeled. Experiments on a variety of 1D and 2D textures are described to illustrate our theory and to show the performance of our algorithms. These experiments demonstrate that many textures which are previously considered as from different categories can be modeled and synthesized in a common framework.

746 citations

••

TL;DR: This paper presents a method that uses the level sets of volumes to reconstruct the shapes of 3D objects from range data and presents an analytical characterization of the surface that maximizes the posterior probability, and presents a novel computational technique for level-set modeling, called the sparse-field algorithm.

Abstract: This paper presents a method that uses the level sets of volumes to reconstruct the shapes of 3D objects from range data. The strategy is to formulate 3D reconstruction as a statistical problem: find that surface which is mostly likely, given the data and some prior knowledge about the application domain. The resulting optimization problem is solved by an incremental process of deformation. We represent a deformable surface as the level set of a discretely sampled scalar function of three dimensions, i.e., a volume. Such level-set models have been shown to mimic conventional deformable surface models by encoding surface movements as changes in the greyscale values of the volume. The result is a voxel-based modeling technology that offers several advantages over conventional parametric models, including flexible topology, no need for reparameterization, concise descriptions of differential structure, and a natural scale space for hierarchical representations. This paper builds on previous work in both 3D reconstruction and level-set modeling. It presents a fundamental result in surface estimation from range data: an analytical characterization of the surface that maximizes the posterior probability. It also presents a novel computational technique for level-set modeling, called the sparse-field algorithm, which combines the advantages of a level-set approach with the computational efficiency and accuracy of a parametric representation. The sparse-field algorithm is more efficient than other approaches, and because it assigns the level set to a specific set of grid points, it positions the level-set model more accurately than the grid itself. These properties, computational efficiency and subcell accuracy, are essential when trying to reconstruct the shapes of 3D objects. Results are shown for the reconstruction objects from sets of noisy and overlapping range maps.

593 citations

••

Yale University

^{1}TL;DR: It is proved that the set of n-pixel images of a convex object with a Lambertian reflectance function, illuminated by an arbitrary number of point light sources at infinity, forms a conveX polyhedral cone in IRn and that the dimension of this illumination cone equals the number of distinct surface normals.

Abstract: The appearance of an object depends on both the viewpoint from which it is observed and the light sources by which it is illuminated. If the appearance of two objects is never identical for any pose or lighting conditions, then–in theory–the objects can always be distinguished or recognized. The question arises: What is the set of images of an object under all lighting conditions and pose? In this paper, we consider only the set of images of an object under variable illumination, including multiple, extended light sources and shadows. We prove that the set of n-pixel images of a convex object with a Lambertian reflectance function, illuminated by an arbitrary number of point light sources at infinity, forms a convex polyhedral cone in R^n and that the dimension of this illumination cone equals the number of distinct surface normals. Furthermore, the illumination cone can be constructed from as few as three images. In addition, the set of n-pixel images of an object of any shape and with a more general reflectance function, seen under all possible illumination conditions, still forms a convex cone in R^n. Extensions of these results to color images are presented. These results immediately suggest certain approaches to object recognition. Throughout, we present results demonstrating the illumination cone representation.

574 citations

••

TL;DR: This paper constructs a distance between deformations defined through a metric given the cost of infinitesimal deformations, and proposes a numerical scheme to solve a variational problem involving this distance and leading to a sub-optimal gradient pattern matching.

Abstract: In a previous paper, it was proposed to see the deformations of a common pattern as the action of an infinite dimensional group. We show in this paper that this approac h can be applied numerically for pattern matching in image analysis of digital images. Using Lie group ideas, we construct a distance between deformations defined through a metric given the cost of infinitesimal deformations. Then we propose a numerical scheme to solve a variational problem involving this distance and leading to a sub-optimal gradient pattern matching. Its links with fluid models are established.

391 citations

••

TL;DR: This paper addresses this problem with some novel algorithms based on iteratively diffusing support at different disparity hypotheses, and develops a novel Bayesian estimation technique, which significantly outperforms techniques based on area-based matching (SSD) and regular diffusion.

Abstract: One of the central problems in stereo matching (and other image registration tasks) is the selection of optimal window sizes for comparing image regions. This paper addresses this problem with some novel algorithms based on iteratively diffusing support at different disparity hypotheses, and locally controlling the amount of diffusion based on the current quality of the disparity estimate. It also develops a novel Bayesian estimation technique, which significantly outperforms techniques based on area-based matching (SSD) and regular diffusion. We provide experimental results on both synthetic and real stereo image pairs.

••

TL;DR: It is shown how a new approach to the numerical approximation of differential invariants, based on suitable combination of joint invariants of the underlying group action, allows one to numerically compute differential invariant signatures in a fully group-invariant manner.

Abstract: We introduce a new paradigm, the differential invariant signature curve or manifold, for the invariant recognition of visual objects A general theorem of E Cartan implies that two curves are related by a group transformation if and only if their signature curves are identical The important examples of the Euclidean and equi-affine groups are discussed in detail Secondly, we show how a new approach to the numerical approximation of differential invariants, based on suitable combination of joint invariants of the underlying group action, allows one to numerically compute differential invariant signatures in a fully group-invariant manner Applications to a variety of fundamental issues in vision, including detection of symmetries, visual tracking, and reconstruction of occlusions, are discussed

••

TL;DR: A class of broadband operators are proposed that, when used together, provide invariance to scene texture and produce accurate and dense depth maps and a depth confidence measure is derived that can be computed from the outputs of the operators.

Abstract: A fundamental problem in depth from defocus is the measurement of relative defocus between images The performance of previously proposed focus operators are inevitably sensitive to the frequency spectra of local scene textures As a result, focus operators such as the Laplacian of Gaussian result in poor depth estimates An alternative is to use large filter banks that densely sample the frequency space Though this approach can result in better depth accuracy, it sacrifices the computational efficiency that depth from defocus offers over stereo and structure from motion We propose a class of broadband operators that, when used together, provide invariance to scene texture and produce accurate and dense depth maps Since the operators are broadband, a small number of them are sufficient for depth estimation of scenes with complex textural properties In addition, a depth confidence measure is derived that can be computed from the outputs of the operators This confidence measure permits further refinement of computed depth maps Experiments are conducted on both synthetic and real scenes to evaluate the performance of the proposed operators The depth detection gain error is less than 1%, irrespective of texture frequency Depth accuracy is found to be 05∼12% of the distance of the object from the imaging optics

••

TL;DR: A statistical approach is used to estimate the grouping of points to subspaces in the presence of noise by computing which partition has the maximum likelihood.

Abstract: We want to deduce, from a sequence of noisy two-dimensional images of a scene of several rigid bodies moving independently in three dimensions, the number of bodies and the grouping of given feature points in the images to the bodies. Prior processing is assumed to have identified features or points common to all frames and the images are assumed to be created by orthographic projection (i.e., perspective effects are minimal). We describe a computationally inexpensive algorithm that can determine which points or features belong to which rigid body using the fact that, with exact observations in orthographic projection, points on a single body lie in a three or less dimensional linear manifold of frame space. If there are enough observations and independent motions, these manifolds can be viewed as a set linearly independent, four or less dimensional subspaces. We show that the row echelon canonical form provides direct information on the grouping of points to these subspaces. Treatment of the noise is the most difficult part of the problem. This paper uses a statistical approach to estimate the grouping of points to subspaces in the presence of noise by computing which partition has the maximum likelihood. The input data is assumed to be contaminated with independent Gaussian noise. The algorithm can base its estimates on a user-supplied standard deviation of the noise, or it can estimate the noise from the data. The algorithm can also be used to estimate the probability of a user-specified partition so that the hypothesis can be combined with others using Bayesian statistics.

••

TL;DR: A new technique is described for synthesizing images of faces from new viewpoints, when only a single 2D image is available, which is interesting for view independent face recognition tasks as well as for image synthesis problems in areas like teleconferencing and virtualized reality.

Abstract: Images formed by a human face change with viewpoint. A new technique is described for synthesizing images of faces from new viewpoints, when only a single 2D image is available. A novel 2D image of a face can be computed without explicitly computing the 3D structure of the head. The technique draws on a single generic 3D model of a human head and on prior knowledge of faces based on example images of other faces seen in different poses. The example images are used to ’’learn‘‘ a pose-invariant shape and texture description of a new face. The 3D model is used to solve the correspondence problem between images showing faces in different poses.
The proposed method is interesting for view independent face recognition tasks as well as for image synthesis problems in areas like teleconferencing and virtualized reality.

••

TL;DR: A form of the generalised Hough transform is used in conjuction with explicit probability-based voting models to find consistent matches and to identify the approximate poses of vehicles in traffic scenes, which under normal conditions stand on the ground-plane.

Abstract: Objects are often constrained to lie on a known plane. This paper concerns the pose determination and recognition of vehicles in traffic scenes, which under normal conditions stand on the ground-plane. The ground-plane constraint reduces the problem of localisation and recognition from 6 dof to 3 dof.
The ground-plane constraint significantly reduces the pose redundancy of 2D image and 3D model line matches. A form of the generalised Hough transform is used in conjuction with explicit probability-based voting models to find consistent matches and to identify the approximate poses. The algorithms are applied to images of several outdoor traffic scenes and successful results are obtained. The work reported in this paper illustrates the efficiency and robustness of context-based vision in a practical application of computer vision.
Multiple cameras may be used to overcome the limitations of a single camera. Data fusion in the proposed algorithms is shown to be simple and straightforward.

••

TL;DR: Two new robust optic flow methods are introduced that outperform other published methods both in accuracy and robustness and uses total least squares to solve the optic flow problem.

Abstract: This paper formulates the optic flow problem as a set of over-determined simultaneous linear equations. It then introduces and studies two new robust optic flow methods. The first technique is based on using the Least Median of Squares (LMedS) to detect the outliers. Then, the inlier group is solved using the least square technique. The second method employs a new robust statistical method named the Least Median of Squares Orthogonal Distances (LMSOD) to identify the outliers and then uses total least squares to solve the optic flow problem. The performance of both methods are studied by experiments on synthetic and real image sequences. These methods outperform other published methods both in accuracy and robustness.

••

TL;DR: A novel approach to the three-dimensional human body model acquisition from three mutually orthogonal views based on the spatiotemporal analysis of the deforming apparent contour of a human moving according to a protocol of movements.

Abstract: We present a novel approach to the three-dimensional human body model acquisition from three mutually orthogonal views. Our technique is based on the spatiotemporal analysis of the deforming apparent contour of a human moving according to a protocol of movements. For generality and robustness our technique does not use a prior model of the human body and a prior body part segmentation is not assumed. Therefore, our technique applies to humans of any anthropometric dimension. To parameterize and segment over time a deforming apparent contour, we introduce a new shape representation technique based on primitive composition. The composed deformable model allows us to represent large local deformations and their evolution in a compact and intuitive way. In addition, this representation allows us to hypothesize an underlying part structure and test this hypothesis against the relative motion (due to forces exerted from the image data) of the defining primitives of the composed model. Furthermore, we develop a Human Body Part Decomposition Algorithm (HBPDA) that recovers all the body parts of a subject by monitoring the changes over time to the shape of the deforming silhouette. In addition, we modularize the process of simultaneous two-dimensional part determination and shape estimation by employing the Supervisory Control Theory of Discrete Event Systems. Finally, we present a novel algorithm which selectively integrates the (segmented by the HBPDA) apparent contours from three mutually orthogonal viewpoints to obtain a three-dimensional model of the subject‘s body parts. The effectiveness of the approach is demonstrated through a series of experiments where a subject performs a set of movements according to a protocol that reveals the structure of the human body.

••

Gunma University

^{1}TL;DR: This paper presents a statistical framework for detecting degeneracies of a geometric model by evaluating its predictive capability in terms of the expected residual and derive the geometric AIC, which allows us to detect singularities in a structure-from-motion analysis without introducing any empirically adjustable thresholds.

Abstract: In building a 3-D model of the environment from image and sensor data, one must fit to the data an appropriate class of models, which can be regarded as a parametrized manifold, or geometric model, defined in the data space. In this paper, we present a statistical framework for detecting degeneracies of a geometric model by evaluating its predictive capability in terms of the expected residual and derive the geometric AIC. We show that it allows us to detect singularities in a structure-from-motion analysis without introducing any empirically adjustable thresholds. We illustrate our approach by simulation examples. We also discuss the application potential of this theory for a wide range of computer vision and robotics problems.

••

TL;DR: A coordinate-free approach to the geometry of computer vision problems is discussed, believing the present formulation to be the only one in which least-squares estimates of the motion and structure are derived simultaneously using analytic derivatives.

Abstract: We discuss a coordinate-free approach to the geometry of computer vision problems. The technique we use to analyse the three-dimensional transformations involved will be that of geometric algebra: a framework based on the algebras of Clifford and Grassmann. This is not a system designed specifically for the task in hand, but rather a framework for all mathematical physics. Central to the power of this approach is the way in which the formalism deals with rotations; for example, if we have two arbitrary sets of vectors, known to be related via a 3D rotation, the rotation is easily recoverable if the vectors are given. Extracting the rotation by conventional means is not as straightforward. The calculus associated with geometric algebra is particularly powerful, enabling one, in a very natural way, to take derivatives with respect to any multivector (general element of the algebra). What this means in practice is that we can minimize with respect to rotors representing rotations, vectors representing translations, or any other relevant geometric quantity. This has important implications for many of the least-squares problems in computer vision where one attempts to find optimal rotations, translations etc., given observed vector quantities. We will illustrate this by analysing the problem of estimating motion from a pair of images, looking particularly at the more difficult case in which we have available only 2D information and no information on range. While this problem has already been much discussed in the literature, we believe the present formulation to be the only one in which least-squares estimates of the motion and structure are derived simultaneously using analytic derivatives.

••

TL;DR: A unified treatment is presented here detailing important deviations from Lambertian behavior for both rough and smooth surfaces that have important bearing on computer vision methods relying upon assumptions about diffuse reflection.

Abstract: There are many computational vision techniques that fundamentally rely upon assumptions about the nature of diffuse reflection from object surfaces consisting of commonly occurring nonmetallic materials. Probably the most prevalent assumption made about diffuse reflection by computer vision researchers is that its reflected radiance distribution is described by the Lambertian model, whether the surface is rough or smooth. While computationally and mathematically a relatively simple model, in physical reality the Lambertian model is deficient in accurately describing the reflected radiance distribution for both rough and smooth nonmetallic surfaces. Recently, in computer vision diffuse reflectance models have been proposed separately for rough, and, smooth nonconducting dielectric surfaces each of these models accurately predicting salient non-Lambertian phenomena that have important bearing on computer vision methods relying upon assumptions about diffuse reflection. Together these reflectance models are complementary in their respective applicability to rough and smooth surfaces. A unified treatment is presented here detailing important deviations from Lambertian behavior for both rough and smooth surfaces. Some speculation is given as to how these separate diffuse reflectance models may be combined.

••

TL;DR: In this article, the authors proposed an algorithm to automatically construct detectors for arbitrary parametric features, including edges, lines, corners, and junctions, by using realistic multi-parameter feature models and incorporating optical and sensing effects.

Abstract: Most visual features are parametric in nature, including, edges, lines, corners, and junctions. We propose an algorithm to automatically construct detectors for arbitrary parametric features. To maximize robustness we use realistic multi-parameter feature models and incorporate optical and sensing effects. Each feature is represented as a densely sampled parametric manifold in a low dimensional subspace of a Hilbert space. During detection, the vector of intensity values in a window about each pixel in the image is projected into the subspace. If the projection lies sufficiently close to the feature manifold, the feature is detected and the location of the closest manifold point yields the feature parameters. The concepts of parameter reduction by normalization, dimension reduction, pattern rejection, and heuristic search are all employed to achieve the required efficiency. Detectors have been constructed for five features, namely, step edge (five parameters), roof edge (five parameters), line (six parameters), corner (five parameters), and circular disc (six parameters). The results of detailed experiments are presented which demonstrate the robustness of feature detection and the accuracy of parameter estimation.

••

TL;DR: A global optimization algorithm for solving the detection of significant local reflectional symmetry in grey level images is presented and is related to genetic algorithms and to adaptive random search techniques.

Abstract: The detection of significant local reflectional symmetry in grey level images is considered. Prior segmentation is not assumed, and it is intended that the results could be used for guiding visual attention and for providing side information to segmentation algorithms. A local measure of reflectional symmetry that transforms the symmetry detection problem to a global optimization problem is defined. Reflectional symmetry detection becomes equivalent to finding the global maximum of a complicated multimodal function parameterized by the location of the center of the supporting region, its size, and the orientation of the symmetry axis. Unlike previous approaches, time consuming exhaustive search is avoided. A global optimization algorithm for solving the problem is presented. It is related to genetic algorithms and to adaptive random search techniques. The efficiency of the suggested algorithm is experimentally demonstrated. Just one thousand evaluations of the local symmetry measure are typically needed in order to locate the dominant symmetry in natural test images.

••

TL;DR: The nonlinear conjugate gradient algorithm in conjunction with an incomplete Cholesky preconditioning is developed to solve the resulting nonlinear minimization problem.

Abstract: In this paper, we present two very efficient and accurate algorithms for computing optical flow. The first is a modified gradient-based regularization method, and the other is an SSD-based regularization method. For the gradient-based method, to amend the errors in the discrete image flow equation caused by numerical differentiation as well as temporal and spatial aliasing in the brightness function, we propose to selectively combine the image flow constraint and a contour-based flow constraint into the data constraint by using a reliability measure. Each data constraint is appropriately normalized to obtain an approximate minimum distance (of the data point to the linear flow equation) constraint instead of the conventional linear flow constraint. These modifications lead to robust and accurate optical flow estimation. We propose an incomplete Cholesky preconditioned conjugate gradient algorithm to solve the resulting large and sparse linear system efficiently. Our SSD-based regularization method uses a normalized SSD measure (based on a similar reasoning as in the gradient-based scheme) as the data constraint in a regularization framework. The nonlinear conjugate gradient algorithm in conjunction with an incomplete Cholesky preconditioning is developed to solve the resulting nonlinear minimization problem. Experimental results on synthetic and real image sequences for these two algorithms are given to demonstrate their performance in comparison with competing methods reported in literature.

••

IBM

^{1}TL;DR: This paper shows that one of the measures used for distances between shapes in (an experimental version of) IBM's QBIC1 ("Query by Image Content") system satisfies a relaxed triangle inequality, although it does not satisfy the triangle inequality.

Abstract: Any notion of “closeness” in pattern matching should have the property that if A is close to B, and B is close to C, then A is close to C. Traditionally, this property is attained because of the triangle inequality (d(A, C) ≤ d(A, B) + d(B, C), where d represents a notion of distance). However, the full power of the triangle inequality is not needed for this property to hold. Instead, a “relaxed triangle inequality” suffices, of the form d(A, C) ≤ c(d(A, B) + d(B, C)), where c is a constant that is not too large. In this paper, we show that one of the measures used for distances between shapes in (an experimental version of) IBM‘s QBIC1 (“Query by Image Content”) system (Niblack et al., 1993) satisfies a relaxed triangle inequality, although it does not satisfy the triangle inequality.

••

TL;DR: It is shown that the Saliency Network recovers the most salient curve efficiently, but it has problems with identifying any salient curve other than themost salient one.

Abstract: The Saliency Network proposed by Shashua and Ullman (1988) is a well-known approach to the problem of extracting salient curves from images while performing gap completion. This paper analyzes the Saliency Network. The Saliency Network is attractive for several reasons. First, the network generally prefers long and smooth curves over short or wiggly ones. While computing saliencies, the network also fills in gaps with smooth completions and tolerates noise. Finally, the network is locally connected, and its size is proportional to the size of the image.
Nevertheless, our analysis reveals certain weaknesses with the method. In particular, we show cases in which the most salient element does not lie on the perceptually most salient curve. Furthermore, in some cases the saliency measure changes its preferences when curves are scaled uniformly. Also, we show that for certain fragmented curves the measure prefers large gaps over a few small gaps of the same total size. In addition, we analyze the time complexity required by the method. We show that the number of steps required for convergence in serial implementations is quadratic in the size of the network, and in parallel implementations is linear in the size of the network. We discuss problems due to coarse sampling of the range of possible orientations. Finally, we consider the possibility of using the Saliency Network for grouping. We show that the Saliency Network recovers the most salient curve efficiently, but it has problems with identifying any salient curve other than the most salient one.

••

TL;DR: This paper proposes a novel method to obtain the reliable edge and depth information by integrating a set of multi-focus images, i.e., a sequence of images taken by systematically varying a camera parameter focus, using a step edge model.

Abstract: This paper proposes a novel method to obtain the reliable edge and depth information by integrating a set of multi-focus images, i.e., a sequence of images taken by systematically varying a camera parameter focus. In previous work on depth measurement using focusing or defocusing, the accuracy depends upon the size and location of local windows where the amount of blur is measured. In contrast, no windowing is needed in our method; the blur is evaluated from the intensity change along corresponding pixels in the multi-focus images. Such a blur analysis enables us not only to detect the edge points without using spatial differentiation but also to estimate the depth with high accuracy. In addition, the analysis result is stable because the proposed method involves integral computations such as summation and least-square model fitting. This paper first discusses the fundamental properties of multi-focus images based on a step edge model. Then, two algorithms are presented: edge detection using an accumulated defocus image which represents the spatial distribution of blur, and depth estimation using a spatio-focal image which represents the intensity distribution along focus axis. The experimental results demonstrate that the highly precise measurement has been achieved: 0.5 pixel position fluctuation in edge detection and 0.2% error at 2.4 m in depth estimation.

••

TL;DR: A robust and accurate polarization phase-based technique for material classification is presented that has significant complementary advantages with respect to existing techniques, is computationally efficient, and can be easily implemented with existing imaging technology.

Abstract: A robust and accurate polarization phase-based technique for material classification is presented. The novelty of this technique is three-fold in (i) its theoretical development, (ii) application, and, (iii) experimental implementation. The concept of phase of polarization of a light wave is introduced to computer vision for discrimination between materials according to their intrinsic electrical conductivity, such as distinguishing conducting metals, and poorly conducting dielectrics. Previous work has used intensity, color and polarization component ratios. This new method is based on the physical principle that metals retard orthogonal components of light upon reflection while dielectrics do not. This method has significant complementary advantages with respect to existing techniques, is computationally efficient, and can be easily implemented with existing imaging technology. Experiments for real circuit board inspection, nonconductive and conductive glass, and, outdoor object recognition have been performed to demonstrate its accuracy and potential capabilities.

••

TL;DR: The purpose of this article is to define optic flow for scalar and density images without using a priori knowledge other than its defining conservation principle, and to incorporate measurement duality, notably the scale-space paradigm.

Abstract: The purpose of this article is to define optic flow for scalar and density images without using a priori knowledge other than its defining conservation principle, and to incorporate measurement duality, notably the scale-space paradigm. It is argued that the design of optic flow based applications may benefit from amanifest separation between factual image structure on the one hand, and goal-specific details and hypotheses about image flow formation on the other.
The approach is based on a physical symmetry principle known as gauge invariance . Data-independent models can be incorporated by means of admissible gauge conditions, each of which may single out a distinct solution, but all of which must be compatible with the evidence supported by the image data. The theory is illustrated by examples and verified by simulations, and performance is compared to several techniques reported in the literature.

••

TL;DR: This paper shows that, for projective views and with structure and position defined projectively, these problems are dual because they can be solved using constraint equations where space points and camera positions occur in a reciprocal way.

Abstract: Given multiple image data from a set of points in 3D, there are two fundamental questions that can be addressed:
What is the structure of the set of points in 3D?
What are the positions of the cameras relative to the points?
In this paper we show that, for projective views and with structure and position defined projectively, these problems are dual because they can be solved using constraint equations where space points and camera positions occur in a reciprocal way. More specifically, by using canonical projective reference frames for all points in space and images, the imaging of point sets in space by multiple cameras can be captured by constraint relations involving three different kinds of parameters only, coordinates of: (1) space points, (2) camera positions (3) image points. The duality implies that the problem of computing camera positions fromp points in q views can be solved with the same algorithm as the problem of directly reconstructing q+4 points in p-4 views. This unifies different approaches to projective reconstruction: methods based on external calibration and direct methods exploiting constraints that exist between shape and image invariants.