Showing papers in "International Journal of Computer Vision in 2005"

PDF

Open Access

Journal Article•DOI•

[...]

Krystian Mikolajczyk¹, Tinne Tuytelaars², Cordelia Schmid³, Andrew Zisserman¹, Jiri Matas⁴, Frederik Schaffalitzky¹, Timor Kadir¹, L. Van Gool² - Show less +4 more•Institutions (4)

University of Oxford¹, Katholieke Universiteit Leuven², French Institute for Research in Computer Science and Automation³, Czech Technical University in Prague⁴

01 Nov 2005-International Journal of Computer Vision

TL;DR: A snapshot of the state of the art in affine covariant region detectors, and compares their performance on a set of test images under varying imaging conditions to establish a reference test set of images and performance software so that future detectors can be evaluated in the same framework.

...read moreread less

Abstract: The paper gives a snapshot of the state of the art in affine covariant region detectors, and compares their performance on a set of test images under varying imaging conditions. Six types of detectors are included: detectors based on affine normalization around Harris (Mikolajczyk and Schmid, 2002; Schaffalitzky and Zisserman, 2002) and Hessian points (Mikolajczyk and Schmid, 2002), a detector of `maximally stable extremal regions', proposed by Matas et al. (2002); an edge-based region detector (Tuytelaars and Van Gool, 1999) and a detector based on intensity extrema (Tuytelaars and Van Gool, 2000), and a detector of `salient regions', proposed by Kadir, Zisserman and Brady (2004). The performance is measured against changes in viewpoint, scale, illumination, defocus and image compression. The objective of this paper is also to establish a reference test set of images and performance software, so that future detectors can be evaluated in the same framework.

...read moreread less

3,359 citations

Journal Article•DOI•

Pictorial Structures for Object Recognition

[...]

Pedro F. Felzenszwalb¹, Daniel P. Huttenlocher²•Institutions (2)

Massachusetts Institute of Technology¹, Cornell University²

01 Jan 2005-International Journal of Computer Vision

TL;DR: A computationally efficient framework for part-based modeling and recognition of objects, motivated by the pictorial structure models introduced by Fischler and Elschlager, that allows for qualitative descriptions of visual appearance and is suitable for generic recognition problems.

...read moreread less

Abstract: In this paper we present a computationally efficient framework for part-based modeling and recognition of objects. Our work is motivated by the pictorial structure models introduced by Fischler and Elschlager. The basic idea is to represent an object by a collection of parts arranged in a deformable configuration. The appearance of each part is modeled separately, and the deformable configuration is represented by spring-like connections between pairs of parts. These models allow for qualitative descriptions of visual appearance, and are suitable for generic recognition problems. We address the problem of using pictorial structure models to find instances of an object in an image as well as the problem of learning an object model from training examples, presenting efficient algorithms in both cases. We demonstrate the techniques by learning models that represent faces and human bodies and using the resulting models to locate the corresponding objects in novel images.

...read moreread less

2,514 citations

Journal Article•DOI•

Computing Large Deformation Metric Mappings via Geodesic Flows of Diffeomorphisms

[...]

M. Faisal Beg¹, Michael I. Miller¹, Alain Trouvé², Laurent Younes³•Institutions (3)

Johns Hopkins University¹, University of Paris², École normale supérieure de Cachan³

01 Feb 2005-International Journal of Computer Vision

TL;DR: The Euler-Lagrange equations characterizing the minimizing vector fields vt, t∈[0, 1] assuming sufficient smoothness of the norm to guarantee existence of solutions in the space of diffeomorphisms are derived.

...read moreread less

Abstract: This paper examine the Euler-Lagrange equations for the solution of the large deformation diffeomorphic metric mapping problem studied in Dupuis et al. (1998) and Trouve (1995) in which two images I 0, I 1 are given and connected via the diffeomorphic change of coordinates I 0???1=I 1 where ?=?1 is the end point at t= 1 of curve ? t , t?[0, 1] satisfying .? t =v t (? t ), t? [0,1] with ?0=id. The variational problem takes the form $$\mathop {\arg {\text{m}}in}\limits_{\upsilon :\dot \phi _t = \upsilon _t \left( {\dot \phi } \right)} \left( {\int_0^1 {\left\| {\upsilon _t } \right\|} ^2 {\text{d}}t + \left\| {I_0 \circ \phi _1^{ - 1} - I_1 } \right\|_{L^2 }^2 } \right),$$ where ?v t? V is an appropriate Sobolev norm on the velocity field v t(·), and the second term enforces matching of the images with ?·?L 2 representing the squared-error norm. In this paper we derive the Euler-Lagrange equations characterizing the minimizing vector fields v t, t?[0, 1] assuming sufficient smoothness of the norm to guarantee existence of solutions in the space of diffeomorphisms. We describe the implementation of the Euler equations using semi-lagrangian method of computing particle flows and show the solutions for various examples. As well, we compute the metric distance on several anatomical configurations as measured by ?0 1?v t? V dt on the geodesic shortest paths.

...read moreread less

1,640 citations

Journal Article•DOI•

Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods

[...]

Andrés Bruhn¹, Joachim Weickert¹, Christoph Schnörr²•Institutions (2)

Saarland University¹, University of Mannheim²

21 Feb 2005-International Journal of Computer Vision

TL;DR: In this paper, the authors compare the role of smoothing/regularization processes that are required in local and global differential methods for optic flow computation, and propose a simple confidence measure that minimizes energy functionals.

...read moreread less

Abstract: Differential methods belong to the most widely used techniques for optic flow computation in image sequences. They can be classified into local methods such as the Lucas-Kanade technique or Bigun's structure tensor method, and into global methods such as the Horn/Schunck approach and its extensions. Often local methods are more robust under noise, while global techniques yield dense flow fields. The goal of this paper is to contribute to a better understanding and the design of novel differential methods in four ways: (i) We juxtapose the role of smoothing/regularisation processes that are required in local and global differential methods for optic flow computation. (ii) This discussion motivates us to describe and evaluate a novel method that combines important advantages of local and global approaches: It yields dense flow fields that are robust against noise. (iii) Spatiotemporal and nonlinear extensions as well as multiresolution frameworks are presented for this hybrid method. (iv) We propose a simple confidence measure for optic flow methods that minimise energy functionals. It allows to sparsify a dense flow field gradually, depending on the reliability required for the resulting flow. Comparisons with experiments from the literature demonstrate the favourable performance of the proposed methods and the confidence measure.

...read moreread less

1,256 citations

Journal Article•DOI•

A Statistical Approach to Texture Classification from Single Images

[...]

Manik Varma¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

01 Apr 2005-International Journal of Computer Vision

TL;DR: A method of reliably measuring relative orientation co-occurrence statistics in a rotationally invariant manner is presented, and whether incorporating such information can enhance the classifier’s performance is discussed.

...read moreread less

Abstract: We investigate texture classification from single images obtained under unknown viewpoint and illumination. A statistical approach is developed where textures are modelled by the joint probability distribution of filter responses. This distribution is represented by the frequency histogram of filter response cluster centres (textons). Recognition proceeds from single, uncalibrated images and the novelty here is that rotationally invariant filters are used and the filter response space is low dimensional.

...read moreread less

1,145 citations

Journal Article•DOI•

The Amsterdam Library of Object Images

[...]

Jan-Mark Geusebroek¹, Gertjan J. Burghouts¹, Arnold W. M. Smeulders¹•Institutions (1)

University of Amsterdam¹

01 Jan 2005-International Journal of Computer Vision

TL;DR: In order to capture the sensory variation in object recordings, this work systematically varied viewing angle, illumination angle, and illumination color for each object, and additionally captured wide-baseline stereo images.

...read moreread less

Abstract: We present the ALOI collection of 1,000 objects recorded under various imaging circumstances. In order to capture the sensory variation in object recordings, we systematically varied viewing angle, illumination angle, and illumination color for each object, and additionally captured wide-baseline stereo images. We recorded over a hundred images of each object, yielding a total of 110,250 images for the collection. These images are made publicly available for scientific research purposes.

...read moreread less

927 citations

Journal Article•DOI•

Three-Dimensional Face Recognition

[...]

Alexander M. Bronstein¹, Michael M. Bronstein¹, Ron Kimmel¹•Institutions (1)

Technion – Israel Institute of Technology¹

01 Aug 2005-International Journal of Computer Vision

TL;DR: The result is an efficient and accurate face recognition algorithm, robust to facial expressions, that can distinguish between identical twins and compare its performance to classical face recognition methods.

...read moreread less

Abstract: An expression-invariant 3D face recognition approach is presented. Our basic assumption is that facial expressions can be modelled as isometries of the facial surface. This allows to construct expression-invariant representations of faces using the bending-invariant canonical forms approach. The result is an efficient and accurate face recognition algorithm, robust to facial expressions, that can distinguish between identical twins (the first two authors). We demonstrate a prototype system based on the proposed algorithm and compare its performance to classical face recognition methods. The numerical methods employed by our approach do not require the facial surface explicitly. The surface gradients field, or the surface metric, are sufficient for constructing the expression-invariant representation of any given face. It allows us to perform the 3D face recognition task while avoiding the surface reconstruction stage.

...read moreread less

569 citations

Journal Article•DOI•

Articulated Body Motion Capture by Stochastic Search

[...]

Jonathan Deutscher¹, Ian Reid¹•Institutions (1)

University of Oxford¹

01 Feb 2005-International Journal of Computer Vision

TL;DR: A modified particle filter is developed which is shown to be effective at searching the high-dimensional configuration spaces encountered in visual tracking of articulated body motion and to be capable of recovering full articulated bodymotion efficiently.

...read moreread less

Abstract: We develop a modified particle filter which is shown to be effective at searching the high-dimensional configuration spaces (c. 30 + dimensions) encountered in visual tracking of articulated body motion. The algorithm uses a continuation principle, based on annealing, to introduce the influence of narrow peaks in the fitness function, gradually. The new algorithm, termed annealed particle filtering, is shown to be capable of recovering full articulated body motion efficiently. A mechanism for achieving a soft partitioning of the search space is described and implemented, and shown to improve the algorithm's performance. Likewise, the introduction of a crossover operator is shown to improve the effectiveness of the search for kinematic trees (such as a human body). Results are given for a variety of agile motions such as walking, running and jumping.

...read moreread less

486 citations

Journal Article•DOI•

Image Parsing: Unifying Segmentation, Detection, and Recognition

[...]

Zhuowen Tu¹, Xiangrong Chen¹, Alan L. Yuille¹, Song-Chun Zhu¹•Institutions (1)

University of California, Los Angeles¹

01 Jul 2005-International Journal of Computer Vision

TL;DR: In this paper, a Bayesian framework for parsing images into their constituent visual patterns is presented, which optimizes the posterior probability and outputs a scene representation as a "parsing graph", in a spirit similar to parsing sentences in speech and natural language.

...read moreread less

Abstract: In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation as a "parsing graph", in a spirit similar to parsing sentences in speech and natural language. The algorithm constructs the parsing graph and re-configures it dynamically using a set of moves, which are mostly reversible Markov chain jumps. This computational framework integrates two popular inference approaches--generative (top-down) methods and discriminative (bottom-up) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilities based on a sequence (cascade) of bottom-up tests/filters. In our Markov chain algorithm design, the posterior probability, defined by the generative models, is the invariant (target) probability for the Markov chain, and the discriminative probabilities are used to construct proposal probabilities to drive the Markov chain. Intuitively, the bottom-up discriminative probabilities activate top-down generative models. In this paper, we focus on two types of visual patterns--generic visual patterns, such as texture and shading, and object patterns including human faces and text. These types of patterns compete and cooperate to explain the image and so image parsing unifies image segmentation, object detection, and recognition (if we use generic visual patterns only then image parsing will correspond to image segmentation (Tu and Zhu, 2002. IEEE Trans. PAMI, 24(5):657--673). We illustrate our algorithm on natural images of complex city scenes and show examples where image segmentation can be improved by allowing object specific knowledge to disambiguate low-level segmentation cues, and conversely where object detection can be improved by using generic visual patterns to explain away shadows and occlusions.

...read moreread less

463 citations

Journal Article•DOI•

Dual Norms and Image Decomposition Models

[...]

Jean-François Aujol¹, Antonin Chambolle²•Institutions (2)

French Institute for Research in Computer Science and Automation¹, École Polytechnique²

01 Jun 2005-International Journal of Computer Vision

TL;DR: This paper compares the properties of various norms that are dual of Sobolev or Besov norms, and proposes a decomposition model which splits an image into three components: a first one containing the structure of the image, a second one the texture of theimage, and a third one the noise.

...read moreread less

Abstract: Following a recent work by Y. Meyer, decomposition models into a geometrical component and a textured component have recently been proposed in image processing. In such approaches, negative Sobolev norms have seemed to be useful to modelize oscillating patterns. In this paper, we compare the properties of various norms that are dual of Sobolev or Besov norms. We then propose a decomposition model which splits an image into three components: a first one containing the structure of the image, a second one the texture of the image, and a third one the noise. Our decomposition model relies on the use of three different semi-norms: the total variation for the geometrical component, a negative Sobolev norm for the texture, and a negative Besov norm for the noise. We illustrate our study with numerical examples.

...read moreread less

362 citations

Journal Article•DOI•

Motion Competition: A Variational Approach to Piecewise Parametric Motion Segmentation

[...]

Daniel Cremers, Stefano Soatto¹•Institutions (1)

University of California, Los Angeles¹

01 May 2005-International Journal of Computer Vision

TL;DR: A novel variational approach for segmenting the image plane into a set of regions of parametric motion on the basis of two consecutive frames from an image sequence based on a conditional probability for the spatio-temporal image gradient and a geometric prior on the estimated motion field.

...read moreread less

Abstract: We present a novel variational approach for segmenting the image plane into a set of regions of parametric motion on the basis of two consecutive frames from an image sequence. Our model is based on a conditional probability for the spatio-temporal image gradient, given a particular velocity model, and on a geometric prior on the estimated motion field favoring motion boundaries of minimal length. Exploiting the Bayesian framework, we derive a cost functional which depends on parametric motion models for each of a set of regions and on the boundary separating these regions. The resulting functional can be interpreted as an extension of the Mumford-Shah functional from intensity segmentation to motion segmentation. In contrast to most alternative approaches, the problems of segmentation and motion estimation are jointly solved by continuous minimization of a single functional. Minimizing this functional with respect to its dynamic variables results in an eigenvalue problem for the motion parameters and in a gradient descent evolution for the motion discontinuity set. We propose two different representations of this motion boundary: an explicit spline-based implementation which can be applied to the motion-based tracking of a single moving object, and an implicit multiphase level set implementation which allows for the segmentation of an arbitrary number of multiply connected moving objects. Numerical results both for simulated ground truth experiments and for real-world sequences demonstrate the capacity of our approach to segment objects based exclusively on their relative motion.

...read moreread less

Journal Article•DOI•

Data Processing Algorithms for Generating Textured 3D Building Facade Meshes from Laser Scans and Camera Images

[...]

Christian Frueh¹, Siddharth Jain¹, Avideh Zakhor¹•Institutions (1)

University of California, Berkeley¹

01 Feb 2005-International Journal of Computer Vision

TL;DR: A set of data processing algorithms for generating textured facade meshes of cities from a series of vertical 2D surface scans and camera images obtained by a laser scanner and digital camera while driving on public roads under normal traffic conditions are developed.

...read moreread less

Abstract: In this paper, we develop a set of data processing algorithms for generating textured facade meshes of cities from a series of vertical 2D surface scans and camera images, obtained by a laser scanner and digital camera while driving on public roads under normal traffic conditions. These processing steps are needed to cope with imperfections and non-idealities inherent in laser scanning systems such as occlusions and reflections from glass surfaces. The data is divided into easy-to-handle quasi-linear segments corresponding to approximately straight driving direction and sequential topological order of vertical laser scans; each segment is then transformed into a depth image. Dominant building structures are detected in the depth images, and points are classified into foreground and background layers. Large holes in the background layer, caused by occlusion from foreground layer objects, are filled in by planar or horizontal interpolation. The depth image is further processed by removing isolated points and filling remaining small holes. The foreground objects also leave holes in the texture of building facades, which are filled by horizontal and vertical interpolation in low frequency regions, or by a copy-paste method otherwise. We apply the above steps to a large set of data of downtown Berkeley with several million 3D points, in order to obtain texture-mapped 3D models.

...read moreread less

Journal Article•DOI•

What are Textons

[...]

Song-Chun Zhu¹, Cheng-en Guo¹, Yizhou Wang¹, Zijian Xu¹•Institutions (1)

University of California, Los Angeles¹

01 Apr 2005-International Journal of Computer Vision

TL;DR: A three-level generative image model for learning textons from texture images and a sequence of experiments for learning the geometric, dynamic, and photometric structures from images and videos are presented and how general textons can be learned from generic natural images is discussed.

...read moreread less

Abstract: Textons refer to fundamental micro-structures in natural images (and videos) and are considered as the atoms of pre-attentive human visual perception (Julesz, 1981). Unfortunately, the word "texton" remains a vague concept in the literature for lack of a good mathematical model. In this article, we first present a three-level generative image model for learning textons from texture images. In this model, an image is a superposition of a number of image bases selected from an over-complete dictionary including various Gabor and Laplacian of Gaussian functions at various locations, scales, and orientations. These image bases are, in turn, generated by a smaller number of texton elements, selected from a dictionary of textons. By analogy to the waveform-phoneme-word hierarchy in speech, the pixel-base-texton hierarchy presents an increasingly abstract visual description and leads to dimension reduction and variable decoupling. By fitting the generative model to observed images, we can learn the texton dictionary as parameters of the generative model. Then the paper proceeds to study the geometric, dynamic, and photometric structures of the texton representation by further extending the generative model to account for motion and illumination variations. (1) For the geometric structures, a texton consists of a number of image bases with deformable spatial configurations. The geometric structures are learned from static texture images. (2) For the dynamic structures, the motion of a texton is characterized by a Markov chain model in time which sometimes can switch geometric configurations during the movement. We call the moving textons as "motons". The dynamic models are learned using the trajectories of the textons inferred from video sequence. (3) For photometric structures, a texton represents the set of images of a 3D surface element under varying illuminations and is called a "lighton" in this paper. We adopt an illumination-cone representation where a lighton is a texton triplet. For a given light source, a lighton image is generated as a linear sum of the three texton bases. We present a sequence of experiments for learning the geometric, dynamic, and photometric structures from images and videos, and we also present some comparison studies with K-mean clustering, sparse coding, independent component analysis, and transformed component analysis. We shall discuss how general textons can be learned from generic natural images.

...read moreread less

Journal Article•DOI•

Shape-From-Silhouette Across Time Part II: Applications to Human Modeling and Markerless Motion Tracking

[...]

Kong-Man (German) Cheung, Simon Baker¹, Takeo Kanade¹•Institutions (1)

Carnegie Mellon University¹

21 Jul 2005-International Journal of Computer Vision

TL;DR: This paper builds a system to acquire human kinematic models consisting of precise shape, joint locations, and body part segmentation and shows how they can be used to track the motion of the person in new video sequences.

...read moreread less

Abstract: In Part I of this paper we developed the theory and algorithms for performing Shape-From-Silhouette (SFS) across time. In this second part, we show how our temporal SFS algorithms can be used in the applications of human modeling and markerless motion tracking. First we build a system to acquire human kinematic models consisting of precise shape (constructed using the temporal SFS algorithm for rigid objects), joint locations, and body part segmentation (estimated using the temporal SFS algorithm for articulated objects). Once the kinematic models have been built, we show how they can be used to track the motion of the person in new video sequences. This marker-less tracking algorithm is based on the Visual Hull alignment algorithm used in both temporal SFS algorithms and utilizes both geometric (silhouette) and photometric (color) information.

...read moreread less

Journal Article•DOI•

Shape and the Stereo Correspondence Problem

[...]

Abhijit Ogale¹, Yiannis Aloimonos¹•Institutions (1)

University of Maryland, College Park¹

01 Dec 2005-International Journal of Computer Vision

TL;DR: In this article, the authors examine the implications of shape on the process of finding dense correspondence and half-occlusions for a stereo pair of images and introduce horizontal and vertical slant to create a first order approximation to piecewise continuity.

...read moreread less

Abstract: We examine the implications of shape on the process of finding dense correspondence and half-occlusions for a stereo pair of images. The desired property of the disparity map is that it should be a piecewise continuous function which is consistent with the images and which has the minimum number of discontinuities. To zeroth order, piecewise continuity becomes piecewise constancy. Using this approximation, we first discuss an approach for dealing with such a fronto-parallel shapeless world, and the problems involved therein. We then introduce horizontal and vertical slant to create a first order approximation to piecewise continuity. In particular, we emphasize the following geometric fact: a horizontally slanted surface (i.e., having depth variation in the direction of the separation of the two cameras) will appear horizontally stretched in one image as compared to the other image. Thus, while corresponding two images, N pixels on a scanline in one image may correspond to a different number of pixels M in the other image. This leads to three important modifications to existing stereo algorithms: (a) due to unequal sampling, existing intensity matching metrics must be modified, (b) unequal numbers of pixels in the two images must be allowed to correspond to each other, and (c) the uniqueness constraint, which is often used for detecting occlusions, must be changed to an interval uniqueness constraint. We also discuss the asymmetry between vertical and horizontal slant, and the central role of non-horizontal edges in the context of vertical slant. Using experiments, we discuss cases where existing algorithms fail, and how the incorporation of these new constraints provides correct results.

...read moreread less

Journal Article•DOI•

Shape-From-Silhouette Across Time Part I: Theory and Algorithms

[...]

Kong-Man (German) Cheung¹, Simon Baker¹, Takeo Kanade¹•Institutions (1)

Carnegie Mellon University¹

01 May 2005-International Journal of Computer Vision

TL;DR: A theory of performing SFS across time: estimating the shape of a dynamic object (with unknown motion) by combining all of the silhouette images of the object over time is developed.

...read moreread less

Abstract: Shape-From-Silhouette (SFS) is a shape reconstruction method which constructs a 3D shape estimate of an object using silhouette images of the object. The output of a SFS algorithm is known as the Visual Hull (VH). Traditionally SFS is either performed on static objects, or separately at each time instant in the case of videos of moving objects. In this paper we develop a theory of performing SFS across time: estimating the shape of a dynamic object (with unknown motion) by combining all of the silhouette images of the object over time. We first introduce a one dimensional element called a Bounding Edge to represent the Visual Hull. We then show that aligning two Visual Hulls using just their silhouettes is in general ambiguous and derive the geometric constraints (in terms of Bounding Edges) that govern the alignment. To break the alignment ambiguity, we combine stereo information with silhouette information and derive a Temporal SFS algorithm which consists of two steps: (1) estimate the motion of the objects over time (Visual Hull Alignment) and (2) combine the silhouette information using the estimated motion (Visual Hull Refinement). The algorithm is first developed for rigid objects and then extended to articulated objects. In the Part II of this paper we apply our temporal SFS algorithm to two human-related applications: (1) the acquisition of detailed human kinematic models and (2) marker-less motion tracking.

...read moreread less

Journal Article•DOI•

Multi-View Stereo Reconstruction of Dense Shape and Complex Appearance

[...]

Hailin Jin¹, Stefano Soatto², Anthony Yezzi³•Institutions (3)

Adobe Systems¹, University of California, Los Angeles², Georgia Institute of Technology³

21 Jul 2005-International Journal of Computer Vision

TL;DR: In this paper, the problem of estimating the three-dimensional shape and complex appearance of a scene from a calibrated set of views under fixed illumination is addressed, based on a rank condition that must be satisfied when the scene exhibits "specular + diffuse" reflectance characteristics, which is used to define a cost functional for the discrepancy between the measured images and those generated by the estimate of the scene.

...read moreread less

Abstract: We address the problem of estimating the three-dimensional shape and complex appearance of a scene from a calibrated set of views under fixed illumination. Our approach relies on a rank condition that must be satisfied when the scene exhibits "specular + diffuse" reflectance characteristics. This constraint is used to define a cost functional for the discrepancy between the measured images and those generated by the estimate of the scene, rather than attempting to match image-to-image directly. Minimizing such a functional yields the optimal estimate of the shape of the scene, represented by a dense surface, as well as its radiance, represented by four functions defined on such a surface. These can be used to generate novel views that capture the non-Lambertian appearance of the scene.

...read moreread less

Journal Article•DOI•

A Six-Stimulus Theory for Stochastic Texture

[...]

Jan-Mark Geusebroek¹, Arnold W. M. Smeulders¹•Institutions (1)

University of Amsterdam¹

01 Apr 2005-International Journal of Computer Vision

TL;DR: The results indicate that texture perception can be approached like the experimental science of colorimetry, and the Weibull parameters are demonstrated to be sensitive to orthogonal variations in the imaging conditions, specifically to the illumination conditions, camera magnification and resolving power, andThe texture orientation.

...read moreread less

Abstract: We report a six-stimulus basis for stochastic texture perception. Fragmentation of the scene by a chaotic process causes the spatial scene statistics to conform to a Weibull-distribution. The parameters of the Weibull distribution characterize the spatial structure of uniform stochastic textures of many different origins completely. In this paper, we report the perceptual significance of the Weibull parameters. We demonstrate the parameters to be sensitive to orthogonal variations in the imaging conditions, specifically to the illumination conditions, camera magnification and resolving power, and the texture orientation. Apparently, the Weibull parameters form a six-stimulus basis for stochastic texture description. The results indicate that texture perception can be approached like the experimental science of colorimetry.

...read moreread less

Journal Article•DOI•

The Raxel Imaging Model and Ray-Based Calibration

[...]

Michael Grossberg¹, Shree K. Nayar¹•Institutions (1)

Columbia University¹

01 Feb 2005-International Journal of Computer Vision

TL;DR: This work proposes an imaging model which is flexible enough to represent an arbitrary imaging system, which can be used to describe systems using fisheye lenses or compound insect eyes, which violate the assumptions of perspective projection.

...read moreread less

Abstract: An imaging model provides a mathematical description of correspondence between points in a scene and in an image. The dominant imaging model, perspective projection, has long been used to describe traditional cameras as well as the human eye. We propose an imaging model which is flexible enough to represent an arbitrary imaging system. For example using this model we can describe systems using fisheye lenses or compound insect eyes, which violate the assumptions of perspective projection. By relaxing the requirements of perspective projection, we give imaging system designers greater freedom to explore systems which meet other requirements such as compact size and wide field of view. We formulate our model by noting that all imaging systems perform a mapping from incoming scene rays to photosensitive elements on the image detector. This mapping can be conveniently described using a set of virtual sensing elements called raxels. Raxels include geometric, radiometric and optical properties. We present a novel ray based calibration method that uses structured light patterns to extract the raxel parameters of an arbitrary imaging system. Experimental results for perspective as well as non-perspective imaging systems are included.

...read moreread less

Journal Article•DOI•

Shape-from-Shading Under Perspective Projection

[...]

Ariel Tankus¹, Nir Sochen¹, Yehezkel Yeshurun¹•Institutions (1)

Tel Aviv University¹

01 Jun 2005-International Journal of Computer Vision

TL;DR: It is suggested that the more realistic set of assumptions of perspective SfS improves reconstruction significantly with respect to orthographic S fS, and can be used for real-life applications in fields such as endoscopy.

...read moreread less

Abstract: Shape-from-Shading (SfS) is a fundamental problem in Computer Vision. A very common assumption in this field is that image projection is orthographic. This paper re-examines the basis of SfS, the image irradiance equation, under a perspective projection assumption. The resultant equation does not depend on the depth function directly, but rather, on its natural logarithm. As such, it is invariant to scale changes of the depth function. A reconstruction method based on the perspective formula is then suggested; it is a modification of the Fast Marching method of Kimmel and Sethian. Following that, a comparison of the orthographic Fast Marching, perspective Fast Marching and the perspective algorithm of Prados and Faugeras on synthetic images is presented. The two perspective methods show better reconstruction results than the orthographic. The algorithm of Prados and Faugeras equates with the perspective Fast Marching. Following that, a comparison of the orthographic and perspective versions of the Fast Marching method on endoscopic images is introduced. The perspective algorithm outperformed the orthographic one. These findings suggest that the more realistic set of assumptions of perspective SfS improves reconstruction significantly with respect to orthographic SfS. The findings also provide evidence that perspective SfS can be used for real-life applications in fields such as endoscopy.

...read moreread less

Journal Article•DOI•

Sub-Pixel Estimation Error Cancellation on Area-Based Matching

[...]

Masao Shimizu¹, Masatoshi Okutomi¹•Institutions (1)

Tokyo Institute of Technology¹

21 Jul 2005-International Journal of Computer Vision

TL;DR: This study analyzed sub-pixel estimation error using two different types of matching model and proposed a new algorithm to greatly reduce sub- pixel estimation error, independent of the similarity measure and the fitting function.

...read moreread less

Abstract: Area-based image matching and sub-pixel displacement estimation using similarity measures are common methods that are used in various fields. Sub-pixel estimation using parabola fitting over three points with their similarity measures is also a common method to increase the matching resolution. However, few investigations or studies have explored the characteristics of this estimation. This study analyzed sub-pixel estimation error using two different types of matching model. Our analysis demonstrates that the estimation contains a systematic error depending on image characteristics, the similarity function, and the fitting function. This error causes some inherently problematic phenomena such as the so-called pixel-locking effect, by which the estimated positions tend to be biased toward integer values. We also show that there are good combinations of the similarity functions and fitting functions. In addition, we propose a new algorithm to greatly reduce sub-pixel estimation error. This method is independent of the similarity measure and the fitting function. Moreover, it is quite simple to implement. The advantage of our novel method is confirmed through experiments using different types of images.

...read moreread less

Journal Article•DOI•

Skin Texture Modeling

[...]

Oana G. Cula¹, Kristin J. Dana¹, Frank P. Murphy², Babar K. Rao²•Institutions (2)

Rutgers University¹, University of Medicine and Dentistry of New Jersey²

01 Apr 2005-International Journal of Computer Vision

TL;DR: Two models are image-based representations of skin appearance that are suitably descriptive without the need for prohibitively complex physics-based skin models are developed.

...read moreread less

Abstract: Quantitative characterization of skin appearance is an important but difficult task. The skin surface is a detailed landscape, with complex geometry and local optical properties. In addition, skin features depend on many variables such as body location (e.g. forehead, cheek), subject parameters (age, gender) and imaging parameters (lighting, camera). As with many real world surfaces, skin appearance is strongly affected by the direction from which it is viewed and illuminated. Computational modeling of skin texture has potential uses in many applications including realistic rendering for computer graphics, robust face models for computer vision, computer-assisted diagnosis for dermatology, topical drug efficacy testing for the pharmaceutical industry and quantitative comparison for consumer products. In this work we present models and measurements of skin texture with an emphasis on faces. We develop two models for use in skin texture recognition. Both models are image-based representations of skin appearance that are suitably descriptive without the need for prohibitively complex physics-based skin models. Our models take into account the varied appearance of the skin with changes in illumination and viewing direction. We also present a new face texture database comprised of more than 2400 images corresponding to 20 human faces, 4 locations on each face (forehead, cheek, chin and nose) and 32 combinations of imaging angles. The complete database is made publicly available for further research.

...read moreread less

Journal Article•DOI•

Determining the Geometry of Boundaries of Objects from Medial Data

[...]

James Damon¹•Institutions (1)

University of North Carolina at Chapel Hill¹

01 Jun 2005-International Journal of Computer Vision

TL;DR: This work considers a region Ω in R2 or R3 with generic smooth boundary B and Blum medial axis M, on which is defined a multivalued “radial vector field” U from points x on M to the points of tangency of the sphere at x with B, and defines a “geometric medial map” on M which corresponds to the differential geometric properties of B.

...read moreread less

Abstract: We consider a region ? in R2 or R3 with generic smooth boundary B and Blum medial axis M, on which is defined a multivalued "radial vector field" U from points x on M to the points of tangency of the sphere at x with B. We introduce a "radial shape operator" Srad and an "edge shape operator" SE which measure how U bends along M. These are not traditional differential geometric shape operators, nonetheless we derive all local differential geometric invariants of B from these operators. This allows us to define from (M, U) a "geometric medial map" on M which corresponds, via a "radial map" from M to B, to the differential geometric properties of B. The geometric medial map also includes a description of the relative geometry of B. This is defined using the "relative critical set" of the radius function r on M. This set consists of a network of curves on M which describe where B is thickest and thinnest. It is computed using the covariant derivative of the tangential component of the unit radial vector field. We further determine how these invariants are related to the differential geometric invariants of M and how these invariants change under deforming diffeomorphisms of M.

...read moreread less

Journal Article•DOI•

Local Shape from Mirror Reflections

[...]

Silvio Savarese¹, Min Chen¹, Pietro Perona¹•Institutions (1)

California Institute of Technology¹

01 Aug 2005-International Journal of Computer Vision

TL;DR: It is proved that surface position and shape up to third order can be derived as a function of local position, orientation and local scale measurements in the image when two orientations are available at the same point.

...read moreread less

Abstract: We study the problem of recovering the 3D shape of an unknown smooth specular surface from a single image. The surface reflects a calibrated pattern onto the image plane of a calibrated camera. The pattern is such that points are available in the image where position, orientations, and local scale may be measured (e.g. checkerboard). We first explore the differential relationship between the local geometry of the surface around the point of reflection and the local geometry in the image.We then study the inverse problem and give necessary and sufficient conditions for recovering surface position and shape.We prove that surface position and shape up to third order can be derived as a function of local position, orientation and local scale measurements in the image when two orientations are available at the same point (e.g. a corner). Information equivalent to scale and orientation measurements can be also extracted from the reflection of a planar scene patch of arbitrary geometry, provided that the reflections of (at least) 3 distinctive points may be identified.We validate our theoretical results with both numerical simulations and experiments with real surfaces.

...read moreread less

Journal Article•

Motion Competition: A variational framework for piecewise parametric motion segmentation

[...]

Daniel Cremers, Stefano Soatto

01 Jan 2005-International Journal of Computer Vision

Journal Article•DOI•

Information-Theoretic Active Polygons for Unsupervised Texture Segmentation

[...]

Gozde Unal¹, Anthony Yezzi², Hamid Krim³•Institutions (3)

Princeton University¹, Georgia Institute of Technology², North Carolina State University³

01 May 2005-International Journal of Computer Vision

TL;DR: A new active contour model is developed which nicely ties the desirable polygonal representation of an object directly to the image segmentation process and can robustly capture texture boundaries by way of higher-order statistics of the data and using an information-theoretic measure and with its nature of the ordinary differential equations.

...read moreread less

Abstract: Curve evolution models used in image segmentation and based on image region information usually utilize simple statistics such as means and variances, hence can not account for higher order nature of the textural characteristics of image regions. In addition, the object delineation by active contour methods, results in a contour representation which still requires a substantial amount of data to be stored for subsequent multimedia applications such as visual information retrieval from databases. Polygonal approximations of the extracted continuous curves are required to reduce the amount of data since polygons are powerful approximators of shapes for use in later recognition stages such as shape matching and coding. The key contribution of this paper is the development of a new active contour model which nicely ties the desirable polygonal representation of an object directly to the image segmentation process. This model can robustly capture texture boundaries by way of higher-order statistics of the data and using an information-theoretic measure and with its nature of the ordinary differential equations. This new variational texture segmentation model, is unsupervised since no prior knowledge on the textural properties of image regions is used. Another contribution in this sequel is a new polygon regularizer algorithm which uses electrostatics principles. This is a global regularizer and is more consistent than a local polygon regularization in preserving local features such as corners.

...read moreread less

Journal Article•DOI•

Natural Image Statistics for Natural Image Segmentation

[...]

Matthias Heiler¹, Christoph Schnörr¹•Institutions (1)

University of Mannheim¹

01 Jun 2005-International Journal of Computer Vision

TL;DR: In this paper, a model for filter response statistics of natural images is integrated into a variational framework for image segmentation, and the model drives level sets toward meaningful segmentations of complex textures and natural scenes.

...read moreread less

Abstract: We integrate a model for filter response statistics of natural images into a variational framework for image segmentation. Incorporated in a sound probabilistic distance measure, the model drives level sets toward meaningful segmentations of complex textures and natural scenes. Despite its enhanced descriptive power, our approach preserves the efficiency of level set based segmentation since each region comprises two model parameters only. Analyzing thousands of natural images we select suitable filter banks, validate the statistical basis of our model, and demonstrate that it outperforms variational segmentation methods using second-order statistics.

...read moreread less

Journal Article•DOI•

The Promise and Perils of Near-Regular Texture

[...]

Yanxi Liu¹, Yanghai Tsin¹, Wen-Chieh Lin¹•Institutions (1)

Carnegie Mellon University¹

01 Apr 2005-International Journal of Computer Vision

TL;DR: It is demonstrated the perils of texture synthesis for near-regular texture and the promise of faithfully preserving the regularity as well as the randomness in a near- regular texture sample.

...read moreread less

Abstract: Motivated by the low structural fidelity for near-regular textures in current texture synthesis algorithms, we propose and implement an alternative texture synthesis method for near-regular texture. We view such textures as statistical departures from regular patterns and argue that a thorough understanding of their structures in terms of their translation symmetries can enhance existing methods of texture synthesis. We demonstrate the perils of texture synthesis for near-regular texture and the promise of faithfully preserving the regularity as well as the randomness in a near-regular texture sample.

...read moreread less

Journal Article•DOI•

Pose Estimation of 3D Free-Form Contours

[...]

Bodo Rosenhahn¹, Christian Perwass², Gerald Sommer²•Institutions (2)

University of Auckland¹, University of Kiel²

01 May 2005-International Journal of Computer Vision

TL;DR: It is shown that twist representations of objects can be numerically efficient and easily be applied to the pose estimation problem of 3D free-form contours and the robustness and real-time performance of the algorithms are visualized.

...read moreread less

Abstract: In this article we discuss the 2D-3D pose estimation problem of 3D free-form contours. In our scenario we observe objects of any 3D shape in an image of a calibrated camera. Pose estimation means to estimate the relative position and orientation (containing a rotation and translation) of the 3D object to the reference camera system. The fusion of modeling free-form contours within the pose estimation problem is achieved by using the conformal geometric algebra. The conformal geometric algebra is a geometric algebra which models entities as stereographically projected entities in a homogeneous model. This leads to a linear description of kinematics on the one hand and projective geometry on the other hand. To model free-form contours in the conformal framework we use twists to model cycloidal curves as twist-depending functions and interpret n-times nested twist generated curves as functions generated by 3D Fourier descriptors. This means, we use the twist concept to apply a spectral domain representation of 3D contours within the pose estimation problem. We will show that twist representations of objects can be numerically efficient and easily be applied to the pose estimation problem. The pose problem itself is formalized as implicit problem and we gain constraint equations, which have to be fulfilled with respect to the unknown rigid body motion. Several experiments visualize the robustness and real-time performance of our algorithms.

...read moreread less

Journal Article•DOI•

Capture and Synthesis of 3D Surface Texture

[...]

Junyu Dong¹, Mike J. Chantler²•Institutions (2)

Ocean University of China¹, Heriot-Watt University²

01 Apr 2005-International Journal of Computer Vision

TL;DR: The conclusion is therefore that the cheaper gradient and three-base-image eigen methods should be used in preference, especially where the surfaces are Lambertian or near Lambertian.

...read moreread less

Abstract: We present and compare five approaches for capturing, synthesising and relighting real 3D surface textures. Unlike 2D texture synthesis techniques they allow the captured textures to be relit using illumination conditions that differ from those of the original. We adapted a texture quilting method due to Efros and combined this with five different relighting representations, comprising: a set of three photometric images; surface gradient and albedo maps; polynomial texture maps; and two eigen based representations using 3 and 6 base images. We used twelve real textures to perform quantitative tests on the relighting methods in isolation. We developed a qualitative test for the assessment of the complete synthesis systems. Ten observers were asked to rank the images obtained from the five methods using five real textures. Statistical tests were applied to the rankings. The six-base-image eigen method produced the best quantitative relighting results and in particular was better able to cope with specular surfaces. However, in the qualitative tests there were no significant performance differences detected between it and the other two top performers. Our conclusion is therefore that the cheaper gradient and three-base-image eigen methods should be used in preference, especially where the surfaces are Lambertian or near Lambertian.

...read moreread less