scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Robust wide baseline stereo from maximally stable extremal regions

01 Jan 2002-pp 1-10
TL;DR: The wide-baseline stereo problem, i.e. the problem of establishing correspondences between a pair of images taken from different viewpoints, is studied and an efficient and practically fast detection algorithm is presented for an affinely-invariant stable subset of extremal regions, the maximally stable extremal region (MSER).
Abstract: The wide-baseline stereo problem, i.e. the problem of establishing correspondences between a pair of images taken from different viewpoints is studied. A new set of image elements that are put into correspondence, the so called extremal regions , is introduced. Extremal regions possess highly desirable properties: the set is closed under (1) continuous (and thus projective) transformation of image coordinates and (2) monotonic transformation of image intensities. An efficient (near linear complexity) and practically fast detection algorithm (near frame rate) is presented for an affinely invariant stable subset of extremal regions, the maximally stable extremal regions (MSER). A new robust similarity measure for establishing tentative correspondences is proposed. The robustness ensures that invariants from multiple measurement regions (regions obtained by invariant constructions from extremal regions), some that are significantly larger (and hence discriminative) than the MSERs, may be used to establish tentative correspondences. The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes. Significant change of scale (3.5×), illumination conditions, out-of-plane rotation, occlusion, locally anisotropic scale change and 3D translation of the viewpoint are all present in the test problems. Good estimates of epipolar geometry (average distance from corresponding points to the epipolar line below 0.09 of the inter-pixel distance) are obtained.

Summary (2 min read)

1 Introduction

  • Finding reliable correspondences in two images of a scene taken from arbitrary viewpoints viewed with possibly different cameras and in different illumination conditions is a difficult and critical step towards fully automatic reconstruction of 3D scenes [5].
  • Successful wide-baseline experiments on indoor and outdoor datasets presented in Section 4 demonstrate the potential of MSERs.
  • Finding epipolar geometry consistent with the largest number of tentative correspondences is the final step of all wide-baseline algorithms.
  • Baumberg [1] applied an iterative scheme originally proposed by Lindeberg and Garding to associate affine-invariant measurement regions with Harris interest points.
  • Maximally Stable Extremal Regions are defined and their detection algorithm is described in Section 2.

2 Maximally Stable Extremal Regions

  • The authors introduce a new type of image elements useful in wide-baseline matching — the Maximally Stable Extremal Regions.
  • The concept can be explained informally as follows.
  • Finally, intensity levels that are local minima of the rate of change of the area function are selected as thresholds producing maximally stable extremal regions.
  • Every extremal region is a connected component of a 1even faster (but more complex) connected component algorithms exist with O(nα(n)) complexity, where α is the inverse Ackerman function; α(n) ≤ 4 for all practical n. thresholded image.
  • The output of the MSER detector is not a binarized image.

3 The proposed robust wide-baseline algorithm

  • As a first step, the DRs are detected - the MSERs computed on the intensity image (MSER+) and on the inverted image (MSER-).
  • Smaller measurement regions are both more likely to satisfy the planarity condition and not to cross a discontinuity in depth or orientation.
  • In all experiments, rotational invariants (based on complex moments) were used after applying a transformation that diagonalises the regions covariance matrix of the DR.
  • First, an affine transformation between pairs of potentially corresponding DRs, i.e. the DRs consistent with the rough EG, is computed.
  • Next, DR correspondences are pruned and only those with correlation of their transformed images above a threshold are selected.

4 Experiments

  • The following experiments were conducted: Bookshelf, (Fig. 1).
  • The part of the scene visible in both views covers a small fraction of the image.
  • The regions matched on the box demonstrate performance on a non-planar surface.
  • The final number of correspondences is given in the penultimate column ’fine EG’.
  • The authors can see, that the precision of the estimated epipolar geometry is very high, much higher than the precision of the rough EG.

5 Conclusions

  • In the paper, a new method for wide-baseline matching was proposed.
  • The three main novelties are: the introduction of MSERs, robust matching of local features and the use of multiple scaled measurement regions.
  • Another novelty of the approach is the use of a robust similarity measure for establishing tentative correspondences.
  • The average distance from corresponding points to the epipolar line was below 0.09 of the inter-pixel distance.
  • Test images included both outdoor and indoor scenes, some already used in published work.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Robust Wide Baseline Stereo from
Maximally Stable Extremal Regions
J. Matas
1,2
, O. Chum
1
,M.Urban
1
,T.Pajdla
1
1
Center for Machine Perception, Dept. of Cybernetics, CTU Prague, Karlovo n´am 13, CZ 121 35
2
CVSSP, University of Surrey, Guildford GU2 7XH, UK
[matas, chum]@cmp.felk.cvut.cz
Abstract
The wide-baseline stereo problem, i.e. the problem of establishing correspon-
dences between a pair of images taken from different viewpoints is studied.
A new set of image elements that are put into correspondence, the so
called extremal regions, is introduced. Extremal regions possess highly de-
sirable properties: the set is closed under 1. continuous (and thus projective)
transformation of image coordinates and 2. monotonic transformation of im-
age intensities. An efficient (near linear complexity) and practically fast de-
tection algorithm (near frame rate) is presented for an affinely-invariant stable
subset of extremal regions, the maximally stable extremal regions (MSER).
A new robust similarity measure for establishing tentative correspon-
dences is proposed. The robustness ensures that invariants from multiple
measurement regions (regions obtained by invariant constructions from ex-
tremal regions), some that are significantly larger (and hence discriminative)
than the MSERs, may be used to establish tentative correspondences.
The high utility of MSERs, multiple measurement regions and the robust
metric is demonstrated in wide-baseline experiments on image pairs from
both indoor and outdoor scenes. Significant change of scale (3.5×), illumi-
nation conditions, out-of-plane rotation, occlusion , locally anisotropic scale
change and 3D translation of the viewpoint are all present in the test prob-
lems. Good estimates of epipolar geometry (average distance from corre-
sponding points to the epipolar line below 0.09 of the inter-pixel distance)
are obtained.
1 Introduction
Finding reliable correspondences in two images of a scene taken from arbitrary view-
points viewed with possibly different cameras and in different illumination conditions is a
difficult and critical step towards fully automatic reconstruction of 3D scenes [5]. A cru-
cial issue is the choice of elements whose correspondence is sought. In the wide-baseline
set-up, local image deformations cannot be realistically approximated by translation or
translation with rotation and a full affine model is required. Correspondence cannot be
therefore established by comparing regions of a fixed (Euclidean) shape like rectangles or
circles since their shape is not preserved under affine transformation.
In most images there are regions that can be detected with high repeatability since they
posses some distinguishing, invariant and stable property. We argue that such regions of,
384
BMVC 2002 doi:10.5244/C.16.36

in general, data-dependent shape, called distinguished regions (DRs) in the paper, may
serve as the elements to be put into correspondence either in stereo matching or object
recognition.
The first contribution of the paper is the introduction of a new set of distinguished
regions, the so called extremal regions. Extremal regions have two desirable properties.
The set is closed under continuous (and thus perspective) transformation of image coor-
dinates and, secondly, it is closed under monotonic transformation of image intensities.
An efficient (near linear complexity) and practically fast detection algorithm is presented
for an affinely-invariant stable subset of extremal regions, the maximally stable extremal
regions (MSER). Robustness of a particular type of DR depends on the image data and
must be tested experimentally. Successful wide-baseline experiments on indoor and out-
door datasets presented in Section 4 demonstrate the potential of MSERs.
Reliable extraction of a manageable number of potentially corresponding image ele-
ments is a necessary but certainly not a sufficient prerequisite for successful wide-baseline
matching. With two sets of distinguished regions, the matching problem can be posed as
a search in the correspondence space [3]. Forming a complete bipartite graph on the two
sets of DRs and searching for a globally consistent subset of correspondences is clearly
out of question for computational reasons. Recently, a whole class of stereo matching
and object recognition algorithms with common structure has emerged [9, 15, 1, 16, 2,
13, 7, 6]. These methods exploit local invariant descriptors to limit the number of tenta-
tive correspondences. Important design decisions at this stage include: 1. the choice of
measurement regions, i.e. the parts of the image on which invariants are computed, 2. the
method of selecting tentative correspondences given the invariant description and 3. the
choice of invariants.
Typically, distinguished regions or their scaled version serve as measurement regions
and tentative correspondences are established by comparing invariants using Mahalanobis
distance [10, 16, 11]. As a second novelty of the presented approach, a robust similar-
ity measure for establishing tentative correspondences is proposed to replace the Maha-
lanobis distance. The robustness of the proposed similarity measure allows us to use
invariants from a collection of measurement regions, even some that are much larger than
the associated distinguished region. Measurements from large regions are either very
discriminative (it is very unlikely that two large parts of the image are identical) or com-
pletely wrong (e.g. if orientation or depth discontinuity becomes part of the region). The
former helps establishing reliable tentative (local) correspondences, the influence of the
latter is limited due to the robustness of the approach.
Finding epipolar geometry consistent with the largest number of tentative (local) cor-
respondences is the final step of all wide-baseline algorithms.
RANSAC has been by far
the most widely adopted method since [14]. The presented algorithm takes novel steps
to increase the number of matched regions and the precision of the epipolar geometry.
The rough epipolar geometry estimated from tentative correspondences is used to guide
the search for further region matches. It restricts location to epipolar lines and provides
an estimate of affine mapping between corresponding regions. This mapping allows the
use of correlation to filter out mismatches. The process significantly increases precision
of the EG estimate; the nal average inlier distance-from-epipolar-line is below 0.1 pixel.
For details see Section 3.
Related work. Since the influential paper by Schmid and Mohr [11] many image
matching and wide-baseline stereo algorithms have been proposed, most commonly using
385

Image I is a mapping I : D⊂Z
2
→S. Extremal regions are well defined on images if:
1. S is totally ordered, i.e. reflexive, antisymmetric and transitive binary relation
exists. In this paper only S = {0, 1,...,255} is considered, but extremal regions
can be defined on e.g. real-valued images (S = R).
2. An adjacency (neighbourhood) relation A ⊂D×Dis defined. In this paper
4-neighbourhoods are used, i.e. p, q ∈Dare adjacent (pAq)iff
d
i=1
|p
i
q
i
|≤1.
Region Q is a contiguous subset of D, i.e. for each p, q ∈Qthere is a sequence
p, a
1
,a
2
,...,a
n
,qand pAa
1
,a
i
Aa
i+1
,a
n
Aq.
(Outer) Region Boundary Q = {q ∈D\Q: p ∈Q: qAp}, i.e. the boundary Q of
Q is the set of pixels being adjacent to at least one pixel of Q but not belonging to Q.
Extremal Region Q⊂D is a region such that for all p ∈Q,q Q : I(p) >I(q)
(maximum intensity region) or I(p) <I(q) (minimum intensity region).
Maximally Stable Extremal Region (MSER). Let Q
1
,...,Q
i1
, Q
i
,...be a sequence
of nested extremal regions, i.e. Q
i
⊂Q
i+1
.ExtremalregionQ
i
is maximally stable iff
q(i)=|Q
i+∆
\Q
i
|/|Q
i
| has a local minimum at i
(|.| denotes cardinality). ∈S
is a parameter of the method.
Table 1: Definitions used in Section 2
Harris interest points as distinguished regions. Tell and Carlsson [13] proposed a method
where line segments connecting Harris interest points form measurement regions. The
measurements are characterised by scale invariant Fourier coefficients. The Harris interest
detector is stable over a range of scales, but defines no scale or affine invariant measure-
ment region. Baumberg [1] applied an iterative scheme originally proposed by Lindeberg
and Garding to associate affine-invariant measurement regions with Harris interest points.
In [7], Mikolajczyk and Schmid show that a scale-invariant MR can be found around
Harris interest points. In [9], Pritchett and Zisserman form groups of line segments and
estimate local homographies using parallelograms as measurement regions. Tuytelaars
and Van Gool introduced two new classes of affine-invariant distinguished regions, one
based on local intensity extrema [16] the other using point and curve features [15]. In
the latter approach, DRs are characterised by measurements from inside an ellipse, con-
structed in an affine invariant manner. Lowe [6] describes the ’Scale Invariant Feature
Transform’ approach which produces a scale and orientation-invariant characterisation of
interest points.
The rest of the paper is structured as follows. Maximally Stable Extremal Regions
are defined and their detection algorithm is described in Section 2. In Section 3, details
of a novel robust matching algorithm are given. Experimental results on outdoor and
indoor images taken with an uncalibrated camera are presented in Section 4. Presented
experiments are summarized and the contributions of the paper are reviewed in Section 5.
2 Maximally Stable Extremal Regions
In this section, we introduce a new type of image elements useful in wide-baseline match-
ing the Maximally Stable Extremal Regions. The regions are defined solely by an
extremal property of the intensity function in the region and on its outer boundary.
The concept can be explained informally as follows. Imagine all possible threshold-
ings of a gray-level image I. We will refer to the pixels below a threshold as ’black’ and
386

to those above or equal as ’white’. If we were shown a movie of thresholded images I
t
,
with frame t corresponding to threshold t, we would see first a white image. Subsequently
black spots corresponding to local intensity minima will appear and grow. At some point
regions corresponding to two local minima will merge. Finally, the last image will be
black. The set of all connected components of all frames of the movie is the set of all
maximal regions; minimal regions could be obtained by inverting the intensity of I and
running the same process. The formal definition of the MSER concept and the necessary
auxiliary definitions are given in Table 1.
In many images, local binarization is stable over a large range of thresholds in certain
regions. Such regions are of interest since they posses the following properties:
Invariance to affine transformation of image intensities.
Covariance to adjacency preserving (continuous) transformation T : D→Don
the image domain.
Stability, since only extremal regions whose support is virtually unchanged over a
range of thresholds is selected.
Multi-scale detection. Since no smoothing is involved, both very fine and very
large structure is detected.
The set of all extremal regions can be enumerated in O(n log log n),wheren is
the number of pixels in the image.
Enumeration of extremal regions proceeds as follows. First, pixels are sorted by inten-
sity. The computational complexity of this step is O(n) if the range of image values S is
small, e.g. the typical {0,...,255}, since the sort can be implemented as
BINSORT [12].
After sorting, pixels are placed in the image (either in decreasing or increasing order) and
the list of connected components and their areas is maintained using the efficient union-
find algorithm [12]. The complexity of our union-find implementation is O(n log log n),
i.e. almost linear
1
. Importantly, the algorithm is very fast in practice. The MSER detec-
tion takes only 0.14 seconds on a Linux PC with the Athlon XP 1600+ processor for an
530x350 image (n = 185500).
The process produces a data structure storing the area of each connected component as
a function of intensity. A merge of two components is viewed as termination of existence
of the smaller component and an insertion of all pixels of the smaller component into the
larger one. Finally, intensity levels that are local minima of the rate of change of the area
function are selected as thresholds producing maximally stable extremal regions. In the
output, each MSER is represented by position of a local intensity minimum (or maximum)
and a threshold.
Notes. The structure of the above algorithm and of an efficient watershed algorithm
[17] is essentially identical. However, the structure of the output of the two algorithms
is different. The watershed is a partitioning of D, i.e. a set of regions R
i
:
R
i
=
D, R
j
∩R
k
= . In watershed computation, focus is on the thresholds where regions
merge (and two watersheds touch). Such threshold are of little interest here, since they are
highly unstable after merge, the region area jumps. In MSER detection, we seek a range
of thresholds that leaves the watershed basin effectively unchanged. Detection of MSER
is also related to thresholding. Every extremal region is a connected component of a
1
even faster (but more complex) connected component algorithms exist with O((n)) complexity, where
α is the inverse Ackerman function; α(n) 4 for all practical n.
387

thresholded image. However, no global or ’optimal’ threshold is sought, all thresholds
are tested and the stability of the connected components evaluated. The output of the
MSER detector is not a binarized image. For some parts of the image, multiple stable
thresholds exist and a system of nested subsets is output in this case. Finally we remark
that MSERs can be defined on any image (even high-dimensional) whose pixel values are
from a totally ordered set.
3 The proposed robust wide-baseline algorithm
Distinguished region detection. As a rst step, the DRs are detected - the MSERs com-
puted on the intensity image (MSER+) and on the inverted image (MSER-).
Measurement regions. A measurement region of arbitrary size may be associated with
each DR, if the construction is affine-covariant. Smaller measurement regions are both
more likely to satisfy the planarity condition and not to cross a discontinuity in depth or
orientation. On the other hand, small regions are less discriminative, i. e. they are much
less likely to be unique. Increasing the size of a measurement region carries the risk of
including parts of background that are completely different in the two images considered.
Clearly, the optimal size of a MR depends on the scene content and it is different for each
DR. In [16], Tuytelaars at al. double the elliptical DR to increase discriminability, while
keeping the probability of crossing object boundaries at an acceptable level.
In the proposed algorithm, measurement regions are selected at multiple scales: the
DR itself, 1.5, 2 and 3 times scaled convex hull of the DR. Since matching is accomplished
in a robust manner, we benefit from the increase of distinctiveness of large regions with-
out being severely affected by clutter or non-planarity of the DR’s pre-image. This is a
novelty of our approach. Commonly, Mahalanobis distance has been used in MR match-
ing. However, the non-robustness of this metric means that matching may fail because of
a single corrupted measurement (this happened in the experiments reported below).
Invariant description. In all experiments, rotational invariants (based on complex
moments) were used after applying a transformation that diagonalises the regions covari-
ance matrix of the DR. In combination, this is an affinely-invariant procedure. Combi-
nation of rotational and affinely invariant generalised colour moments [8] gave a similar
result. On their own, these affine invariants failed on problems with a large scale change.
Robust matching. A measurement taken from an almost planar patch of the scene
with stable invariant description will be referred to as a ’good measurement’. Unstable
measurements or those computed on non-planar surfaces or at discontinuities in depth or
orientation will be referred to as ’corrupted measurements’.
The robust similarity is computed as follows. For each measurement M
i
A
on region
A, k regions B
1
,...,B
k
from the other image with the corresponding i-th measurement
M
i
B
1
,...,M
i
B
k
nearest to M
i
A
are found and a vote is cast suggesting correspondence of
A and each of B
1
,...,B
k
.
Votes are summed over all measurements. In the current implementation 216 invari-
ants at each scale, i.e. a total of 864 measurements are used (i [1, 864]). The DRs with
the largest number of votes are the candidates for tentative correspondences. Experimen-
tally, we found that k set to 1% of the number of regions gives good results.
Probabilistic analysis of the likelihood of the success of the procedure is not simple,
since the distribution of invariants and their noise is image-dependent. We therefore only
suppose that corrupted measurements spread their votes randomly, not conspiring to cre-
ate a high score and that good measurements are more likely to vote for correct matches.
388

Citations
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations


Cites background or methods from "Robust wide baseline stereo from ma..."

  • ...In what appears to be the most affineinvariant method, Mikolajczyk (2002) has proposed and run detailed experiments with the Harris-affine detector....

    [...]

  • ...Matas et al. (2002) have shown that their maximally-stable extremal regions can produce large numbers of matching features with good stability....

    [...]

Journal ArticleDOI
TL;DR: Fiji is a distribution of the popular open-source software ImageJ focused on biological-image analysis that facilitates the transformation of new algorithms into ImageJ plugins that can be shared with end users through an integrated update system.
Abstract: Fiji is a distribution of the popular open-source software ImageJ focused on biological-image analysis. Fiji uses modern software engineering practices to combine powerful software libraries with a broad range of scripting languages to enable rapid prototyping of image-processing algorithms. Fiji facilitates the transformation of new algorithms into ImageJ plugins that can be shared with end users through an integrated update system. We propose Fiji as a platform for productive collaboration between computer science and biology research communities.

43,540 citations

Book ChapterDOI
07 May 2006
TL;DR: A novel scale- and rotation-invariant interest point detector and descriptor, coined SURF (Speeded Up Robust Features), which approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster.
Abstract: In this paper, we present a novel scale- and rotation-invariant interest point detector and descriptor, coined SURF (Speeded Up Robust Features). It approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster. This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors and descriptors (in casu, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and by simplifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps. The paper presents experimental results on a standard evaluation set, as well as on imagery obtained in the context of a real-life object recognition application. Both show SURF's strong performance.

13,011 citations

Journal ArticleDOI
TL;DR: A novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features), which approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster.

12,449 citations


Cites background from "Robust wide baseline stereo from ma..."

  • ...Key words: interest points, local features, feature description, camera calibration, object recognition PACS:...

    [...]

Proceedings ArticleDOI
Sivic1, Zisserman1
13 Oct 2003
TL;DR: An approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video, represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion.
Abstract: We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject unstable regions and reduce the effects of noise in the descriptors. The analogy with text retrieval is in the implementation where matches on descriptors are pre-computed (using vector quantization), and inverted file systems and document rankings are used. The result is that retrieved is immediate, returning a ranked list of key frames/shots in the manner of Google. The method is illustrated for matching in two full length feature films.

6,938 citations


Additional excerpts

  • ...The implementation details are given in [7]....

    [...]

References
More filters
Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations


"Robust wide baseline stereo from ma..." refers background in this paper

  • ...Lowe [7] describes the ‘Scale Invariant Feature Transform’ approach which produces a scale and orientation-invariant characterisation of interest points....

    [...]

  • ...Recently, a whole class of stereo matching and object recognition algorithms with common structure has emerged [1,3,7,9,10,13,15,18,20,21]....

    [...]

Book
01 Jan 2000
TL;DR: In this article, the authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly in a unified framework, including geometric principles and how to represent objects algebraically so they can be computed and applied.
Abstract: From the Publisher: A basic problem in computer vision is to understand the structure of a real world scene given several images of it. Recent major developments in the theory and practice of scene reconstruction are described in detail in a unified framework. The book covers the geometric principles and how to represent objects algebraically so they can be computed and applied. The authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly.

15,558 citations

01 Jan 2001
TL;DR: This book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts and it will show the best book collections and completed collections.
Abstract: Downloading the book in this website lists can give you more advantages. It will show you the best book collections and completed collections. So many books can be found in this website. So, this is not only this multiple view geometry in computer vision. However, this book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts. This is simple, read the soft file of the book and you get it.

14,282 citations


"Robust wide baseline stereo from ma..." refers background or methods in this paper

  • ...Finding reliable correspondences in two images of a scene taken from arbitrary viewpoints viewed with possibly different cameras and in different illumination conditions is a difficult and critical step towards fully automatic reconstruction of 3D scenes [5]....

    [...]

  • ...After establishing the ‘rough EG’ the so-called ‘guided matching’ step is applied [2,5]....

    [...]

Journal ArticleDOI
TL;DR: A fast and flexible algorithm for computing watersheds in digital gray-scale images is introduced, based on an immersion process analogy, which is reported to be faster than any other watershed algorithm.
Abstract: A fast and flexible algorithm for computing watersheds in digital gray-scale images is introduced. A review of watersheds and related motion is first presented, and the major methods to determine watersheds are discussed. The algorithm is based on an immersion process analogy, in which the flooding of the water in the picture is efficiently simulated using of queue of pixel. It is described in detail provided in a pseudo C language. The accuracy of this algorithm is proven to be superior to that of the existing implementations, and it is shown that its adaptation to any kind of digital grid and its generalization to n-dimensional images (and even to graphs) are straightforward. The algorithm is reported to be faster than any other watershed algorithm. Applications of this algorithm with regard to picture segmentation are presented for magnetic resonance (MR) imagery and for digital elevation models. An example of 3-D watershed is also provided. >

4,983 citations


"Robust wide baseline stereo from ma..." refers background in this paper

  • ...The structure of the above algorithm and of an efficient watershed algorithm [22] is essentially identical....

    [...]

Journal ArticleDOI
TL;DR: This paper addresses the problem of retrieving images from large image databases with a method based on local grayvalue invariants which are computed at automatically detected interest points and allows for efficient retrieval from a database of more than 1,000 images.
Abstract: This paper addresses the problem of retrieving images from large image databases. The method is based on local grayvalue invariants which are computed at automatically detected interest points. A voting algorithm and semilocal constraints make retrieval possible. Indexing allows for efficient retrieval from a database of more than 1,000 images. Experimental results show correct retrieval in the case of partial visibility, similarity transformations, extraneous features, and small perspective deformations.

1,756 citations


"Robust wide baseline stereo from ma..." refers methods in this paper

  • ...Since the influential paper by Schmid and Mohr [11] many image matching and wide-baseline stereo algorithms have been proposed, most commonly using Harris interest points as distinguished regions....

    [...]

  • ...Since the influential paper by Schmid and Mohr [16] many image matching and wide-baseline stereo algorithms have been proposed, most commonly using Harris interest points as DRs....

    [...]

  • ...Typically, DRs or their scaled version serve as measurement regions and tentative correspondences are established by comparing invariants using Mahalanobis distance [14,16,21]....

    [...]

Frequently Asked Questions (18)
Q1. What is the definition of a MSER?

The MSERs are sets of image elements, closed under the affine transformation of image coordinates and invariant to affine transformation of intensity. 

The wide-baseline stereo problem, i. e. the problem of establishing correspondences between a pair of images taken from different viewpoints is studied. A new set of image elements that are put into correspondence, the so called extremal regions, is introduced. Extremal regions possess highly desirable properties: the set is closed under 1. continuous ( and thus projective ) transformation of image coordinates and 2. monotonic transformation of image intensities. An efficient ( near linear complexity ) and practically fast detection algorithm ( near frame rate ) is presented for an affinely-invariant stable subset of extremal regions, the maximally stable extremal regions ( MSER ). 

In future work, the authors intend to proceed towards fully automatic projective reconstruction of the 3D scene, which requires computing projective reconstruction and dense matching. 

Finding epipolar geometry consistent with the largest number of tentative (local) correspondences is the final step of all wide-baseline algorithms. 

The three main novelties are: the introduction of MSERs, robust matching of local features and the use of multiple scaled measurement regions. 

distinguished regions or their scaled version serve as measurement regions and tentative correspondences are established by comparing invariants using Mahalanobis distance [10, 16, 11]. 

In future work, the authors intend to proceed towards fully automatic projective reconstruction of the 3D scene, which requires computing projective reconstruction and dense matching. 

Important design decisions at this stage include: 1. the choice of measurement regions, i.e. the parts of the image on which invariants are computed, 2. the method of selecting tentative correspondences given the invariant description and 3. 

A merge of two components is viewed as termination of existence of the smaller component and an insertion of all pixels of the smaller component into the larger one. 

an affine transformation between pairs of potentially corresponding DRs, i.e. the DRs consistent with the rough EG, is computed. 

Since matching is accomplished in a robust manner, the authors benefit from the increase of distinctiveness of large regions without being severely affected by clutter or non-planarity of the DR’s pre-image. 

Since the influential paper by Schmid and Mohr [11] many image matching and wide-baseline stereo algorithms have been proposed, most commonly usingHarris interest points as distinguished regions. 

A measurement taken from an almost planar patch of the scene with stable invariant description will be referred to as a ’good measurement’. 

The robustness of the proposed similarity measure allows us to use invariants from a collection of measurement regions, even some that are much larger than the associated distinguished region. 

Due to the robustness, the authors were able to consider invariants from multiple measurement regions, even some that were significantly larger (and hence probably discriminative) than the associated MSER. 

Probabilistic analysis of the likelihood of the success of the procedure is not simple, since the distribution of invariants and their noise is image-dependent. 

Finally the authors remark that MSERs can be defined on any image (even high-dimensional) whose pixel values are from a totally ordered set. 

In the wide-baseline set-up, local image deformations cannot be realistically approximated by translation or translation with rotation and a full affine model is required.