scispace - formally typeset
Search or ask a question

Showing papers in "International Journal of Computer Vision in 2005"


Journal ArticleDOI
TL;DR: A snapshot of the state of the art in affine covariant region detectors, and compares their performance on a set of test images under varying imaging conditions to establish a reference test set of images and performance software so that future detectors can be evaluated in the same framework.
Abstract: The paper gives a snapshot of the state of the art in affine covariant region detectors, and compares their performance on a set of test images under varying imaging conditions. Six types of detectors are included: detectors based on affine normalization around Harris (Mikolajczyk and Schmid, 2002; Schaffalitzky and Zisserman, 2002) and Hessian points (Mikolajczyk and Schmid, 2002), a detector of `maximally stable extremal regions', proposed by Matas et al. (2002); an edge-based region detector (Tuytelaars and Van Gool, 1999) and a detector based on intensity extrema (Tuytelaars and Van Gool, 2000), and a detector of `salient regions', proposed by Kadir, Zisserman and Brady (2004). The performance is measured against changes in viewpoint, scale, illumination, defocus and image compression. The objective of this paper is also to establish a reference test set of images and performance software, so that future detectors can be evaluated in the same framework.

3,359 citations


Journal ArticleDOI
TL;DR: A computationally efficient framework for part-based modeling and recognition of objects, motivated by the pictorial structure models introduced by Fischler and Elschlager, that allows for qualitative descriptions of visual appearance and is suitable for generic recognition problems.
Abstract: In this paper we present a computationally efficient framework for part-based modeling and recognition of objects. Our work is motivated by the pictorial structure models introduced by Fischler and Elschlager. The basic idea is to represent an object by a collection of parts arranged in a deformable configuration. The appearance of each part is modeled separately, and the deformable configuration is represented by spring-like connections between pairs of parts. These models allow for qualitative descriptions of visual appearance, and are suitable for generic recognition problems. We address the problem of using pictorial structure models to find instances of an object in an image as well as the problem of learning an object model from training examples, presenting efficient algorithms in both cases. We demonstrate the techniques by learning models that represent faces and human bodies and using the resulting models to locate the corresponding objects in novel images.

2,514 citations


Journal ArticleDOI
TL;DR: The Euler-Lagrange equations characterizing the minimizing vector fields vt, t∈[0, 1] assuming sufficient smoothness of the norm to guarantee existence of solutions in the space of diffeomorphisms are derived.
Abstract: This paper examine the Euler-Lagrange equations for the solution of the large deformation diffeomorphic metric mapping problem studied in Dupuis et al. (1998) and Trouve (1995) in which two images I 0, I 1 are given and connected via the diffeomorphic change of coordinates I 0???1=I 1 where ?=?1 is the end point at t= 1 of curve ? t , t?[0, 1] satisfying .? t =v t (? t ), t? [0,1] with ?0=id. The variational problem takes the form $$\mathop {\arg {\text{m}}in}\limits_{\upsilon :\dot \phi _t = \upsilon _t \left( {\dot \phi } \right)} \left( {\int_0^1 {\left\| {\upsilon _t } \right\|} ^2 {\text{d}}t + \left\| {I_0 \circ \phi _1^{ - 1} - I_1 } \right\|_{L^2 }^2 } \right),$$ where ?v t? V is an appropriate Sobolev norm on the velocity field v t(·), and the second term enforces matching of the images with ?·?L 2 representing the squared-error norm. In this paper we derive the Euler-Lagrange equations characterizing the minimizing vector fields v t, t?[0, 1] assuming sufficient smoothness of the norm to guarantee existence of solutions in the space of diffeomorphisms. We describe the implementation of the Euler equations using semi-lagrangian method of computing particle flows and show the solutions for various examples. As well, we compute the metric distance on several anatomical configurations as measured by ?0 1?v t? V dt on the geodesic shortest paths.

1,640 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compare the role of smoothing/regularization processes that are required in local and global differential methods for optic flow computation, and propose a simple confidence measure that minimizes energy functionals.
Abstract: Differential methods belong to the most widely used techniques for optic flow computation in image sequences. They can be classified into local methods such as the Lucas-Kanade technique or Bigun's structure tensor method, and into global methods such as the Horn/Schunck approach and its extensions. Often local methods are more robust under noise, while global techniques yield dense flow fields. The goal of this paper is to contribute to a better understanding and the design of novel differential methods in four ways: (i) We juxtapose the role of smoothing/regularisation processes that are required in local and global differential methods for optic flow computation. (ii) This discussion motivates us to describe and evaluate a novel method that combines important advantages of local and global approaches: It yields dense flow fields that are robust against noise. (iii) Spatiotemporal and nonlinear extensions as well as multiresolution frameworks are presented for this hybrid method. (iv) We propose a simple confidence measure for optic flow methods that minimise energy functionals. It allows to sparsify a dense flow field gradually, depending on the reliability required for the resulting flow. Comparisons with experiments from the literature demonstrate the favourable performance of the proposed methods and the confidence measure.

1,256 citations


Journal ArticleDOI
TL;DR: A method of reliably measuring relative orientation co-occurrence statistics in a rotationally invariant manner is presented, and whether incorporating such information can enhance the classifier’s performance is discussed.
Abstract: We investigate texture classification from single images obtained under unknown viewpoint and illumination. A statistical approach is developed where textures are modelled by the joint probability distribution of filter responses. This distribution is represented by the frequency histogram of filter response cluster centres (textons). Recognition proceeds from single, uncalibrated images and the novelty here is that rotationally invariant filters are used and the filter response space is low dimensional.

1,145 citations


Journal ArticleDOI
TL;DR: In order to capture the sensory variation in object recordings, this work systematically varied viewing angle, illumination angle, and illumination color for each object, and additionally captured wide-baseline stereo images.
Abstract: We present the ALOI collection of 1,000 objects recorded under various imaging circumstances. In order to capture the sensory variation in object recordings, we systematically varied viewing angle, illumination angle, and illumination color for each object, and additionally captured wide-baseline stereo images. We recorded over a hundred images of each object, yielding a total of 110,250 images for the collection. These images are made publicly available for scientific research purposes.

927 citations


Journal ArticleDOI
TL;DR: The result is an efficient and accurate face recognition algorithm, robust to facial expressions, that can distinguish between identical twins and compare its performance to classical face recognition methods.
Abstract: An expression-invariant 3D face recognition approach is presented. Our basic assumption is that facial expressions can be modelled as isometries of the facial surface. This allows to construct expression-invariant representations of faces using the bending-invariant canonical forms approach. The result is an efficient and accurate face recognition algorithm, robust to facial expressions, that can distinguish between identical twins (the first two authors). We demonstrate a prototype system based on the proposed algorithm and compare its performance to classical face recognition methods. The numerical methods employed by our approach do not require the facial surface explicitly. The surface gradients field, or the surface metric, are sufficient for constructing the expression-invariant representation of any given face. It allows us to perform the 3D face recognition task while avoiding the surface reconstruction stage.

569 citations


Journal ArticleDOI
TL;DR: A modified particle filter is developed which is shown to be effective at searching the high-dimensional configuration spaces encountered in visual tracking of articulated body motion and to be capable of recovering full articulated bodymotion efficiently.
Abstract: We develop a modified particle filter which is shown to be effective at searching the high-dimensional configuration spaces (c. 30 + dimensions) encountered in visual tracking of articulated body motion. The algorithm uses a continuation principle, based on annealing, to introduce the influence of narrow peaks in the fitness function, gradually. The new algorithm, termed annealed particle filtering, is shown to be capable of recovering full articulated body motion efficiently. A mechanism for achieving a soft partitioning of the search space is described and implemented, and shown to improve the algorithm's performance. Likewise, the introduction of a crossover operator is shown to improve the effectiveness of the search for kinematic trees (such as a human body). Results are given for a variety of agile motions such as walking, running and jumping.

486 citations


Journal ArticleDOI
TL;DR: In this paper, a Bayesian framework for parsing images into their constituent visual patterns is presented, which optimizes the posterior probability and outputs a scene representation as a "parsing graph", in a spirit similar to parsing sentences in speech and natural language.
Abstract: In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation as a "parsing graph", in a spirit similar to parsing sentences in speech and natural language. The algorithm constructs the parsing graph and re-configures it dynamically using a set of moves, which are mostly reversible Markov chain jumps. This computational framework integrates two popular inference approaches--generative (top-down) methods and discriminative (bottom-up) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilities based on a sequence (cascade) of bottom-up tests/filters. In our Markov chain algorithm design, the posterior probability, defined by the generative models, is the invariant (target) probability for the Markov chain, and the discriminative probabilities are used to construct proposal probabilities to drive the Markov chain. Intuitively, the bottom-up discriminative probabilities activate top-down generative models. In this paper, we focus on two types of visual patterns--generic visual patterns, such as texture and shading, and object patterns including human faces and text. These types of patterns compete and cooperate to explain the image and so image parsing unifies image segmentation, object detection, and recognition (if we use generic visual patterns only then image parsing will correspond to image segmentation (Tu and Zhu, 2002. IEEE Trans. PAMI, 24(5):657--673). We illustrate our algorithm on natural images of complex city scenes and show examples where image segmentation can be improved by allowing object specific knowledge to disambiguate low-level segmentation cues, and conversely where object detection can be improved by using generic visual patterns to explain away shadows and occlusions.

463 citations


Journal ArticleDOI
TL;DR: This paper compares the properties of various norms that are dual of Sobolev or Besov norms, and proposes a decomposition model which splits an image into three components: a first one containing the structure of the image, a second one the texture of theimage, and a third one the noise.
Abstract: Following a recent work by Y. Meyer, decomposition models into a geometrical component and a textured component have recently been proposed in image processing. In such approaches, negative Sobolev norms have seemed to be useful to modelize oscillating patterns. In this paper, we compare the properties of various norms that are dual of Sobolev or Besov norms. We then propose a decomposition model which splits an image into three components: a first one containing the structure of the image, a second one the texture of the image, and a third one the noise. Our decomposition model relies on the use of three different semi-norms: the total variation for the geometrical component, a negative Sobolev norm for the texture, and a negative Besov norm for the noise. We illustrate our study with numerical examples.

362 citations


Journal ArticleDOI
TL;DR: A novel variational approach for segmenting the image plane into a set of regions of parametric motion on the basis of two consecutive frames from an image sequence based on a conditional probability for the spatio-temporal image gradient and a geometric prior on the estimated motion field.
Abstract: We present a novel variational approach for segmenting the image plane into a set of regions of parametric motion on the basis of two consecutive frames from an image sequence. Our model is based on a conditional probability for the spatio-temporal image gradient, given a particular velocity model, and on a geometric prior on the estimated motion field favoring motion boundaries of minimal length. Exploiting the Bayesian framework, we derive a cost functional which depends on parametric motion models for each of a set of regions and on the boundary separating these regions. The resulting functional can be interpreted as an extension of the Mumford-Shah functional from intensity segmentation to motion segmentation. In contrast to most alternative approaches, the problems of segmentation and motion estimation are jointly solved by continuous minimization of a single functional. Minimizing this functional with respect to its dynamic variables results in an eigenvalue problem for the motion parameters and in a gradient descent evolution for the motion discontinuity set. We propose two different representations of this motion boundary: an explicit spline-based implementation which can be applied to the motion-based tracking of a single moving object, and an implicit multiphase level set implementation which allows for the segmentation of an arbitrary number of multiply connected moving objects. Numerical results both for simulated ground truth experiments and for real-world sequences demonstrate the capacity of our approach to segment objects based exclusively on their relative motion.

Journal ArticleDOI
TL;DR: A set of data processing algorithms for generating textured facade meshes of cities from a series of vertical 2D surface scans and camera images obtained by a laser scanner and digital camera while driving on public roads under normal traffic conditions are developed.
Abstract: In this paper, we develop a set of data processing algorithms for generating textured facade meshes of cities from a series of vertical 2D surface scans and camera images, obtained by a laser scanner and digital camera while driving on public roads under normal traffic conditions. These processing steps are needed to cope with imperfections and non-idealities inherent in laser scanning systems such as occlusions and reflections from glass surfaces. The data is divided into easy-to-handle quasi-linear segments corresponding to approximately straight driving direction and sequential topological order of vertical laser scans; each segment is then transformed into a depth image. Dominant building structures are detected in the depth images, and points are classified into foreground and background layers. Large holes in the background layer, caused by occlusion from foreground layer objects, are filled in by planar or horizontal interpolation. The depth image is further processed by removing isolated points and filling remaining small holes. The foreground objects also leave holes in the texture of building facades, which are filled by horizontal and vertical interpolation in low frequency regions, or by a copy-paste method otherwise. We apply the above steps to a large set of data of downtown Berkeley with several million 3D points, in order to obtain texture-mapped 3D models.

Journal ArticleDOI
TL;DR: A three-level generative image model for learning textons from texture images and a sequence of experiments for learning the geometric, dynamic, and photometric structures from images and videos are presented and how general textons can be learned from generic natural images is discussed.
Abstract: Textons refer to fundamental micro-structures in natural images (and videos) and are considered as the atoms of pre-attentive human visual perception (Julesz, 1981). Unfortunately, the word "texton" remains a vague concept in the literature for lack of a good mathematical model. In this article, we first present a three-level generative image model for learning textons from texture images. In this model, an image is a superposition of a number of image bases selected from an over-complete dictionary including various Gabor and Laplacian of Gaussian functions at various locations, scales, and orientations. These image bases are, in turn, generated by a smaller number of texton elements, selected from a dictionary of textons. By analogy to the waveform-phoneme-word hierarchy in speech, the pixel-base-texton hierarchy presents an increasingly abstract visual description and leads to dimension reduction and variable decoupling. By fitting the generative model to observed images, we can learn the texton dictionary as parameters of the generative model. Then the paper proceeds to study the geometric, dynamic, and photometric structures of the texton representation by further extending the generative model to account for motion and illumination variations. (1) For the geometric structures, a texton consists of a number of image bases with deformable spatial configurations. The geometric structures are learned from static texture images. (2) For the dynamic structures, the motion of a texton is characterized by a Markov chain model in time which sometimes can switch geometric configurations during the movement. We call the moving textons as "motons". The dynamic models are learned using the trajectories of the textons inferred from video sequence. (3) For photometric structures, a texton represents the set of images of a 3D surface element under varying illuminations and is called a "lighton" in this paper. We adopt an illumination-cone representation where a lighton is a texton triplet. For a given light source, a lighton image is generated as a linear sum of the three texton bases. We present a sequence of experiments for learning the geometric, dynamic, and photometric structures from images and videos, and we also present some comparison studies with K-mean clustering, sparse coding, independent component analysis, and transformed component analysis. We shall discuss how general textons can be learned from generic natural images.

Journal ArticleDOI
TL;DR: This paper builds a system to acquire human kinematic models consisting of precise shape, joint locations, and body part segmentation and shows how they can be used to track the motion of the person in new video sequences.
Abstract: In Part I of this paper we developed the theory and algorithms for performing Shape-From-Silhouette (SFS) across time. In this second part, we show how our temporal SFS algorithms can be used in the applications of human modeling and markerless motion tracking. First we build a system to acquire human kinematic models consisting of precise shape (constructed using the temporal SFS algorithm for rigid objects), joint locations, and body part segmentation (estimated using the temporal SFS algorithm for articulated objects). Once the kinematic models have been built, we show how they can be used to track the motion of the person in new video sequences. This marker-less tracking algorithm is based on the Visual Hull alignment algorithm used in both temporal SFS algorithms and utilizes both geometric (silhouette) and photometric (color) information.

Journal ArticleDOI
TL;DR: In this article, the authors examine the implications of shape on the process of finding dense correspondence and half-occlusions for a stereo pair of images and introduce horizontal and vertical slant to create a first order approximation to piecewise continuity.
Abstract: We examine the implications of shape on the process of finding dense correspondence and half-occlusions for a stereo pair of images. The desired property of the disparity map is that it should be a piecewise continuous function which is consistent with the images and which has the minimum number of discontinuities. To zeroth order, piecewise continuity becomes piecewise constancy. Using this approximation, we first discuss an approach for dealing with such a fronto-parallel shapeless world, and the problems involved therein. We then introduce horizontal and vertical slant to create a first order approximation to piecewise continuity. In particular, we emphasize the following geometric fact: a horizontally slanted surface (i.e., having depth variation in the direction of the separation of the two cameras) will appear horizontally stretched in one image as compared to the other image. Thus, while corresponding two images, N pixels on a scanline in one image may correspond to a different number of pixels M in the other image. This leads to three important modifications to existing stereo algorithms: (a) due to unequal sampling, existing intensity matching metrics must be modified, (b) unequal numbers of pixels in the two images must be allowed to correspond to each other, and (c) the uniqueness constraint, which is often used for detecting occlusions, must be changed to an interval uniqueness constraint. We also discuss the asymmetry between vertical and horizontal slant, and the central role of non-horizontal edges in the context of vertical slant. Using experiments, we discuss cases where existing algorithms fail, and how the incorporation of these new constraints provides correct results.

Journal ArticleDOI
TL;DR: A theory of performing SFS across time: estimating the shape of a dynamic object (with unknown motion) by combining all of the silhouette images of the object over time is developed.
Abstract: Shape-From-Silhouette (SFS) is a shape reconstruction method which constructs a 3D shape estimate of an object using silhouette images of the object. The output of a SFS algorithm is known as the Visual Hull (VH). Traditionally SFS is either performed on static objects, or separately at each time instant in the case of videos of moving objects. In this paper we develop a theory of performing SFS across time: estimating the shape of a dynamic object (with unknown motion) by combining all of the silhouette images of the object over time. We first introduce a one dimensional element called a Bounding Edge to represent the Visual Hull. We then show that aligning two Visual Hulls using just their silhouettes is in general ambiguous and derive the geometric constraints (in terms of Bounding Edges) that govern the alignment. To break the alignment ambiguity, we combine stereo information with silhouette information and derive a Temporal SFS algorithm which consists of two steps: (1) estimate the motion of the objects over time (Visual Hull Alignment) and (2) combine the silhouette information using the estimated motion (Visual Hull Refinement). The algorithm is first developed for rigid objects and then extended to articulated objects. In the Part II of this paper we apply our temporal SFS algorithm to two human-related applications: (1) the acquisition of detailed human kinematic models and (2) marker-less motion tracking.

Journal ArticleDOI
TL;DR: In this paper, the problem of estimating the three-dimensional shape and complex appearance of a scene from a calibrated set of views under fixed illumination is addressed, based on a rank condition that must be satisfied when the scene exhibits "specular + diffuse" reflectance characteristics, which is used to define a cost functional for the discrepancy between the measured images and those generated by the estimate of the scene.
Abstract: We address the problem of estimating the three-dimensional shape and complex appearance of a scene from a calibrated set of views under fixed illumination. Our approach relies on a rank condition that must be satisfied when the scene exhibits "specular + diffuse" reflectance characteristics. This constraint is used to define a cost functional for the discrepancy between the measured images and those generated by the estimate of the scene, rather than attempting to match image-to-image directly. Minimizing such a functional yields the optimal estimate of the shape of the scene, represented by a dense surface, as well as its radiance, represented by four functions defined on such a surface. These can be used to generate novel views that capture the non-Lambertian appearance of the scene.

Journal ArticleDOI
TL;DR: The results indicate that texture perception can be approached like the experimental science of colorimetry, and the Weibull parameters are demonstrated to be sensitive to orthogonal variations in the imaging conditions, specifically to the illumination conditions, camera magnification and resolving power, andThe texture orientation.
Abstract: We report a six-stimulus basis for stochastic texture perception. Fragmentation of the scene by a chaotic process causes the spatial scene statistics to conform to a Weibull-distribution. The parameters of the Weibull distribution characterize the spatial structure of uniform stochastic textures of many different origins completely. In this paper, we report the perceptual significance of the Weibull parameters. We demonstrate the parameters to be sensitive to orthogonal variations in the imaging conditions, specifically to the illumination conditions, camera magnification and resolving power, and the texture orientation. Apparently, the Weibull parameters form a six-stimulus basis for stochastic texture description. The results indicate that texture perception can be approached like the experimental science of colorimetry.

Journal ArticleDOI
TL;DR: This work proposes an imaging model which is flexible enough to represent an arbitrary imaging system, which can be used to describe systems using fisheye lenses or compound insect eyes, which violate the assumptions of perspective projection.
Abstract: An imaging model provides a mathematical description of correspondence between points in a scene and in an image. The dominant imaging model, perspective projection, has long been used to describe traditional cameras as well as the human eye. We propose an imaging model which is flexible enough to represent an arbitrary imaging system. For example using this model we can describe systems using fisheye lenses or compound insect eyes, which violate the assumptions of perspective projection. By relaxing the requirements of perspective projection, we give imaging system designers greater freedom to explore systems which meet other requirements such as compact size and wide field of view. We formulate our model by noting that all imaging systems perform a mapping from incoming scene rays to photosensitive elements on the image detector. This mapping can be conveniently described using a set of virtual sensing elements called raxels. Raxels include geometric, radiometric and optical properties. We present a novel ray based calibration method that uses structured light patterns to extract the raxel parameters of an arbitrary imaging system. Experimental results for perspective as well as non-perspective imaging systems are included.

Journal ArticleDOI
TL;DR: It is suggested that the more realistic set of assumptions of perspective SfS improves reconstruction significantly with respect to orthographic S fS, and can be used for real-life applications in fields such as endoscopy.
Abstract: Shape-from-Shading (SfS) is a fundamental problem in Computer Vision. A very common assumption in this field is that image projection is orthographic. This paper re-examines the basis of SfS, the image irradiance equation, under a perspective projection assumption. The resultant equation does not depend on the depth function directly, but rather, on its natural logarithm. As such, it is invariant to scale changes of the depth function. A reconstruction method based on the perspective formula is then suggested; it is a modification of the Fast Marching method of Kimmel and Sethian. Following that, a comparison of the orthographic Fast Marching, perspective Fast Marching and the perspective algorithm of Prados and Faugeras on synthetic images is presented. The two perspective methods show better reconstruction results than the orthographic. The algorithm of Prados and Faugeras equates with the perspective Fast Marching. Following that, a comparison of the orthographic and perspective versions of the Fast Marching method on endoscopic images is introduced. The perspective algorithm outperformed the orthographic one. These findings suggest that the more realistic set of assumptions of perspective SfS improves reconstruction significantly with respect to orthographic SfS. The findings also provide evidence that perspective SfS can be used for real-life applications in fields such as endoscopy.

Journal ArticleDOI
TL;DR: This study analyzed sub-pixel estimation error using two different types of matching model and proposed a new algorithm to greatly reduce sub- pixel estimation error, independent of the similarity measure and the fitting function.
Abstract: Area-based image matching and sub-pixel displacement estimation using similarity measures are common methods that are used in various fields. Sub-pixel estimation using parabola fitting over three points with their similarity measures is also a common method to increase the matching resolution. However, few investigations or studies have explored the characteristics of this estimation. This study analyzed sub-pixel estimation error using two different types of matching model. Our analysis demonstrates that the estimation contains a systematic error depending on image characteristics, the similarity function, and the fitting function. This error causes some inherently problematic phenomena such as the so-called pixel-locking effect, by which the estimated positions tend to be biased toward integer values. We also show that there are good combinations of the similarity functions and fitting functions. In addition, we propose a new algorithm to greatly reduce sub-pixel estimation error. This method is independent of the similarity measure and the fitting function. Moreover, it is quite simple to implement. The advantage of our novel method is confirmed through experiments using different types of images.

Journal ArticleDOI
TL;DR: Two models are image-based representations of skin appearance that are suitably descriptive without the need for prohibitively complex physics-based skin models are developed.
Abstract: Quantitative characterization of skin appearance is an important but difficult task. The skin surface is a detailed landscape, with complex geometry and local optical properties. In addition, skin features depend on many variables such as body location (e.g. forehead, cheek), subject parameters (age, gender) and imaging parameters (lighting, camera). As with many real world surfaces, skin appearance is strongly affected by the direction from which it is viewed and illuminated. Computational modeling of skin texture has potential uses in many applications including realistic rendering for computer graphics, robust face models for computer vision, computer-assisted diagnosis for dermatology, topical drug efficacy testing for the pharmaceutical industry and quantitative comparison for consumer products. In this work we present models and measurements of skin texture with an emphasis on faces. We develop two models for use in skin texture recognition. Both models are image-based representations of skin appearance that are suitably descriptive without the need for prohibitively complex physics-based skin models. Our models take into account the varied appearance of the skin with changes in illumination and viewing direction. We also present a new face texture database comprised of more than 2400 images corresponding to 20 human faces, 4 locations on each face (forehead, cheek, chin and nose) and 32 combinations of imaging angles. The complete database is made publicly available for further research.

Journal ArticleDOI
TL;DR: This work considers a region Ω in R2 or R3 with generic smooth boundary B and Blum medial axis M, on which is defined a multivalued “radial vector field” U from points x on M to the points of tangency of the sphere at x with B, and defines a “geometric medial map” on M which corresponds to the differential geometric properties of B.
Abstract: We consider a region ? in R2 or R3 with generic smooth boundary B and Blum medial axis M, on which is defined a multivalued "radial vector field" U from points x on M to the points of tangency of the sphere at x with B. We introduce a "radial shape operator" Srad and an "edge shape operator" SE which measure how U bends along M. These are not traditional differential geometric shape operators, nonetheless we derive all local differential geometric invariants of B from these operators. This allows us to define from (M, U) a "geometric medial map" on M which corresponds, via a "radial map" from M to B, to the differential geometric properties of B. The geometric medial map also includes a description of the relative geometry of B. This is defined using the "relative critical set" of the radius function r on M. This set consists of a network of curves on M which describe where B is thickest and thinnest. It is computed using the covariant derivative of the tangential component of the unit radial vector field. We further determine how these invariants are related to the differential geometric invariants of M and how these invariants change under deforming diffeomorphisms of M.

Journal ArticleDOI
TL;DR: It is proved that surface position and shape up to third order can be derived as a function of local position, orientation and local scale measurements in the image when two orientations are available at the same point.
Abstract: We study the problem of recovering the 3D shape of an unknown smooth specular surface from a single image. The surface reflects a calibrated pattern onto the image plane of a calibrated camera. The pattern is such that points are available in the image where position, orientations, and local scale may be measured (e.g. checkerboard). We first explore the differential relationship between the local geometry of the surface around the point of reflection and the local geometry in the image.We then study the inverse problem and give necessary and sufficient conditions for recovering surface position and shape.We prove that surface position and shape up to third order can be derived as a function of local position, orientation and local scale measurements in the image when two orientations are available at the same point (e.g. a corner). Information equivalent to scale and orientation measurements can be also extracted from the reflection of a planar scene patch of arbitrary geometry, provided that the reflections of (at least) 3 distinctive points may be identified.We validate our theoretical results with both numerical simulations and experiments with real surfaces.


Journal ArticleDOI
TL;DR: A new active contour model is developed which nicely ties the desirable polygonal representation of an object directly to the image segmentation process and can robustly capture texture boundaries by way of higher-order statistics of the data and using an information-theoretic measure and with its nature of the ordinary differential equations.
Abstract: Curve evolution models used in image segmentation and based on image region information usually utilize simple statistics such as means and variances, hence can not account for higher order nature of the textural characteristics of image regions. In addition, the object delineation by active contour methods, results in a contour representation which still requires a substantial amount of data to be stored for subsequent multimedia applications such as visual information retrieval from databases. Polygonal approximations of the extracted continuous curves are required to reduce the amount of data since polygons are powerful approximators of shapes for use in later recognition stages such as shape matching and coding. The key contribution of this paper is the development of a new active contour model which nicely ties the desirable polygonal representation of an object directly to the image segmentation process. This model can robustly capture texture boundaries by way of higher-order statistics of the data and using an information-theoretic measure and with its nature of the ordinary differential equations. This new variational texture segmentation model, is unsupervised since no prior knowledge on the textural properties of image regions is used. Another contribution in this sequel is a new polygon regularizer algorithm which uses electrostatics principles. This is a global regularizer and is more consistent than a local polygon regularization in preserving local features such as corners.

Journal ArticleDOI
TL;DR: In this paper, a model for filter response statistics of natural images is integrated into a variational framework for image segmentation, and the model drives level sets toward meaningful segmentations of complex textures and natural scenes.
Abstract: We integrate a model for filter response statistics of natural images into a variational framework for image segmentation. Incorporated in a sound probabilistic distance measure, the model drives level sets toward meaningful segmentations of complex textures and natural scenes. Despite its enhanced descriptive power, our approach preserves the efficiency of level set based segmentation since each region comprises two model parameters only. Analyzing thousands of natural images we select suitable filter banks, validate the statistical basis of our model, and demonstrate that it outperforms variational segmentation methods using second-order statistics.

Journal ArticleDOI
TL;DR: It is demonstrated the perils of texture synthesis for near-regular texture and the promise of faithfully preserving the regularity as well as the randomness in a near- regular texture sample.
Abstract: Motivated by the low structural fidelity for near-regular textures in current texture synthesis algorithms, we propose and implement an alternative texture synthesis method for near-regular texture. We view such textures as statistical departures from regular patterns and argue that a thorough understanding of their structures in terms of their translation symmetries can enhance existing methods of texture synthesis. We demonstrate the perils of texture synthesis for near-regular texture and the promise of faithfully preserving the regularity as well as the randomness in a near-regular texture sample.

Journal ArticleDOI
TL;DR: It is shown that twist representations of objects can be numerically efficient and easily be applied to the pose estimation problem of 3D free-form contours and the robustness and real-time performance of the algorithms are visualized.
Abstract: In this article we discuss the 2D-3D pose estimation problem of 3D free-form contours. In our scenario we observe objects of any 3D shape in an image of a calibrated camera. Pose estimation means to estimate the relative position and orientation (containing a rotation and translation) of the 3D object to the reference camera system. The fusion of modeling free-form contours within the pose estimation problem is achieved by using the conformal geometric algebra. The conformal geometric algebra is a geometric algebra which models entities as stereographically projected entities in a homogeneous model. This leads to a linear description of kinematics on the one hand and projective geometry on the other hand. To model free-form contours in the conformal framework we use twists to model cycloidal curves as twist-depending functions and interpret n-times nested twist generated curves as functions generated by 3D Fourier descriptors. This means, we use the twist concept to apply a spectral domain representation of 3D contours within the pose estimation problem. We will show that twist representations of objects can be numerically efficient and easily be applied to the pose estimation problem. The pose problem itself is formalized as implicit problem and we gain constraint equations, which have to be fulfilled with respect to the unknown rigid body motion. Several experiments visualize the robustness and real-time performance of our algorithms.

Journal ArticleDOI
TL;DR: The conclusion is therefore that the cheaper gradient and three-base-image eigen methods should be used in preference, especially where the surfaces are Lambertian or near Lambertian.
Abstract: We present and compare five approaches for capturing, synthesising and relighting real 3D surface textures. Unlike 2D texture synthesis techniques they allow the captured textures to be relit using illumination conditions that differ from those of the original. We adapted a texture quilting method due to Efros and combined this with five different relighting representations, comprising: a set of three photometric images; surface gradient and albedo maps; polynomial texture maps; and two eigen based representations using 3 and 6 base images. We used twelve real textures to perform quantitative tests on the relighting methods in isolation. We developed a qualitative test for the assessment of the complete synthesis systems. Ten observers were asked to rank the images obtained from the five methods using five real textures. Statistical tests were applied to the rankings. The six-base-image eigen method produced the best quantitative relighting results and in particular was better able to cope with specular surfaces. However, in the qualitative tests there were no significant performance differences detected between it and the other two top performers. Our conclusion is therefore that the cheaper gradient and three-base-image eigen methods should be used in preference, especially where the surfaces are Lambertian or near Lambertian.