What contributions have the authors mentioned in the paper "Image registration with uncalibrated cameras in hybrid vision systems" ?

This paper addresses the problem of robust registering of images among perspective and omnidirectional cameras in a hybrid vision system ( HVS ). In this paper, the authors propose a non-linear approach for registering images in an HVS without requiring calibration of cameras. The authors first discuss the homographies between omnidirectional and perspective images under a local planar assumption. The authors then propose a robust patch level registration algorithm by exploiting a constraint on large 3D spatial planes.

What future works have the authors mentioned in the paper "Image registration with uncalibrated cameras in hybrid vision systems" ?

The authors assume that local patches of the images represent planar 3D surfaces, which is reasonable in most general cases. Furthermore, the authors also proposed a robust patch level registration algorithm by exploiting the constraint that patches from the same 3D planar surface share the same homography. The dependence of the registration results to the size and properties of local patches indicates that more work needs to be focused on irregular image partition in the future.

What is the definition of the transform between a point pair?

The transform between this point pair can be defined as:P̂ = ReP + Te. (4)Substituting Eq. 1 and 3 into 4, the authors haveẐp̂ = R p + TαP + Te, (5)where R = ReR−1I , T = −ReTf , and Te are homographic related parameters which need to be estimated.

What is the name of the workshop?

The Haar wavelets decompose a given image patch B into four sub-bands: lower frequency band B l, vertical high frequency band B v, horizontal high frequency3Proceedings of the Seventh IEEE Workshop on Applications of Computer Vision (WACV/MOTION’05)

What is the focal length of the mirror?

for a point on the paraboloidal mirror Pm = (Xm, Ym, Zm)T , the paraboloid can be described as:Zm = f − 14f (X 2 m + Y 2 m), (2)where f is the focal length of the mirror.

What is the way to register a perspective image?

The dependence of the registration results to the size and properties of local patches indicates that more work needs to be focused on irregular image partition in the future.

How do the authors compute the homography matrix for a planar surface?

Compute all the corresponding pt = Ht2p̂t; 3. Register p back to the perspective image to obtain p̂ t+1; 4. Update H t+12 = H t 2 + λcorrelation(p̂t, p̂t+1); 5. Loop to step 2 until the stop condition is satisfied.

What is the optical signal from point P?

The signal from point P = (X, Y, Z)T is firstly reflected at Pm = (Xm, Ym, Zm)T on the mirror and then is projected on the image plane at p = (x, y, Zc)T .

What is the p value of the image patch B?

For a point p = (x, y) in the image patch B, its Haar feature values are defined as:Blx,y = 1 4 (B2x,2y + B2x,2y+1 + B2x+1,2y + B2x+1,2y+1) ,Bvx,y = 1 4 (B2x,2y − B2x,2y+1 + B2x+1,2y − B2x+1,2y+1) ,Bhx,y = 1 4 (B2x,2y + B2x,2y+1 − B2x+1,2y − B2x+1,2y+1) ,Bdx,y = 1 4(B2x,2y − B2x,2y+1 − B2x+1,2y + B2x+1,2y+1) .

(Open Access) Image Registration with Uncalibrated Cameras in Hybrid Vision Systems (2005) | Datong Chen

Q: What is the main advantage of a omnidirectional camera?

The omnidirectional camera not only provides a good reference for cameras in the camera network but also minimizes the possibility of occlusions in a tracking process.

Q: What is the drawback of the 2D approach?

A major drawback of the 2D approach is that the calibration step involves manual interaction or specially designed calibration tags with specific patterns or shapes.

Q: How many neighbors of each patch are registered?

The authors iteratively propagate the registration to the 8 neighbors of each new registered patch until the weights β kij of all the new registered patches are equal to zero.

Image Registration with Uncalibrated Cameras in Hybrid Vision Systems

Datong Chen, Jie Yang

School of Computer Science

Carnegie Mellon University

Pittsburgh, PA 15213

Abstract

This paper addresses the problem of robust registering of

images among perspective and omnidirectional cameras in

a hybrid vision system (HVS). Nonlinearity in an HVS in-

troduced by omnidirectional cameras poses challenges for

computing pixel correspondences among images. In previ-

ous HVSs, cameras must be calibrated by performing reg-

istration. In this paper, we propose a non-linear approach

for registering images in an HVS without requiring calibra-

tion of cameras. We ﬁrst discuss the homographies between

omnidirectional and perspective images under a local pla-

nar assumption. We then propose a robust patch level reg-

istration algorithm by exploiting a constraint on large 3D

spatial planes. The proposed approach enables an HVS for

applications that require quick deployment or active cam-

eras. Experimental results have demonstrated feasibility of

the proposed approach.

1 Introduction

Recent demands on video surveillance in a large area have

activated research interest in camera networks. A hybrid vi-

sion system (HVS) is a camera network that consists of om-

nidirectional and perspective cameras. Such a system takes

advantage of a large view scope from omnidirectional cam-

eras and higher resolution from perspective cameras. For

example, Chen et. al. [3] proposed an HVS architecture

in which an omnidirectional was mounted on the ceiling of

the center of a large room and several perspective cameras

were mounted on surrounding side walls. The omnidirec-

tional camera not only provides a good reference for cam-

eras in the camera network but also minimizes the possi-

bility of occlusions in a tracking process. The perspective

camera can capture more detail information in higher res-

olutions. However, nonlinearity in an HVS introduced by

omnidirectional cameras poses challenges for many exist-

ing computer vision techniques, including the technique of

computing pixel correspondences between omnidirectional

and perspective views.

In the previous literature, correspondences of higher res-

olution side-view images and lower resolution top-view im-

ages are computed using 2D perspective homography [4].

The approach requires the pre-calibration of the intrinsic

parameters of an omnidirectional camera. The most com-

monly used omnidirectional camera is a catadioptric cam-

era, which is composed of a perspective camera and a mirror

and provides a single effective viewpoint [10]. The calibra-

tion of a catadioptric camera has been addressed by many

other researchers [7, 2, 9, 13]. After an omnidirectional

camera is calibrated, 2D perspective homography assumes

that the scene in front of each camera is planar and registers

perspective view images under the reference of a distorted

omnidirectional-view image [8, 4].

A major drawback of the 2D approach is that the calibra-

tion step involves manual interaction or specially designed

calibration tags with speciﬁc patterns or shapes. This limits

applications of an HVS system when a quick deployment

is required or auto-zooming cameras are employed. In ad-

dition, the existing registration methods can make large er-

rors due to the fact of the non-planar scene. Some efforts

were made to provide 3D information in an HVS by per-

forming calibration of extrinsic parameters among cameras.

Sturm analyzed catadioptric cameras and perspective cam-

eras within a common scene [14]. Chen et al. proposed a

manual solution based on pre-measured points in real 3D

space [3]. Stereo methods [1, 15, 5, 11, 8] were proposed

for object detection and reconstruction in an HVS. Calibra-

tion of the omnidirectional camera is required by all these

methods.

In this paper, we propose an automatic approach of im-

age registration with uncalibrated cameras in an HVS. We

ﬁrst discuss the geometric correspondence of a planar ob-

ject between a perspective camera and a catadioptric cam-

era and give two homography matrices from both directions.

We then propose an algorithm to register an image of a per-

spective camera to a catadioptric image under the assump-

tion that local image patches are the projections of planar

surfaces. In the proposed algorithm, non-linear 2D registra-

tion is performed at a local patch level. A robust estimation

methodology is proposed for propagating the homography

of patches to their neighborhood. We demonstrate feasibil-

ity of the proposed method through experiments.

Proceedings of the Seventh IEEE Workshop on Applications of Computer Vision (WACV/MOTION’05)

0-7695-2271-8/05 $ 20.00 IEEE

perspective camera

catadioptric camera

perspective camera

perspective camera perspective camera

Figure 1: An illustration of an HVS in an indoor public

environment

2 Mathematic models of an HVS

Let’s consider a simple HVS that consists of only one cata-

dioptric camera and several perspective cameras. Fig. 1

illustrates such a system installed in an indoor public envi-

ronment. We will limit our discussion to such an HVS in

the rest of this paper, though the results can be extended to

more complex systems.

2.1 A catadioptric camera model

A commercial catadioptric camera can be modeled as a

combination of a paraboloid mirror and lenses (see Fig. 2).

To exploit the optical characteristics of a catadioptric sys-

tem in the spatial domain, we can follow the processes by

which a catadioptric acquires an optical signal from a spa-

tial point P =(X, Y, Z)

shown in Figure 2. Without los-

ing generality, let’s select the focus of the paraboloid mirror

as the origin O

. The signal from point P =(X, Y, Z)

is ﬁrstly reﬂected at P

=(X

)

on the mirror

and then is projected on the image plane at p =(x, y, Z

)

To simplify this projection process, we assume the camera

is focused on a virtual focal plane F. The mirror point

is ﬁrst transformed onto focal plane F at the point

=(X

)

and projected on the image plane,

which can be modeled by the following equations:

= α

= P

+ T

p = R

, (1)

where the scale factor α

is a function of the spatial point

P . T

=(0, 0,Z

− Z

)

denotes the translation between

the focal plane F and the mirror. R

models the perspective

projection from the focal plane to the image plane. The

P=(X,Y,Z)

plane F

virtual focal

image plane

X ,Y

p =(x,y,Zc)

paraboloid mirror

pin-hole

Figure 2: A model of catadioptric camera

focal plane Z = Z

has only one parameter. In such a

paraboloidal mirror based catadioptric system, there are two

parameters α

and T

that are 3D spatial point dependent.

The other parameters, R

and Z

, consist only of constant

values.

Furthermore, for a point on the paraboloidal mirror

=(X

)

, the paraboloid can be described

as:

= f −

+ Y

), (2)

where f is the focal length of the mirror.

2.2 A perspective camera model

There are many different approaches for modeling a per-

spective camera. In this paper, we use the linear pin-hole

model. In this model, the geometric relationship between a

spatial point

P =(

and its projection on image

plane ˆp =(f

ˆx, f

ˆy, f

)

can be modeled as:

Z ˆp =

P. (3)

To simplify the discussion in this paper, we assume that

the principle point is located at the image center, the aspect

ratio of the optical axis is 1, and the focus length f

(which

is a scalar of the system) equals 1. However, the results in

this paper can be extended to more complex linear pin-hole

models.

Proceedings of the Seventh IEEE Workshop on Applications of Computer Vision (WACV/MOTION’05)

0-7695-2271-8/05 $ 20.00 IEEE

2.3 Corresponding points in an HVS

A advantage of an HVS system is that the catadioptric cam-

era can provide a global view of a scene. Therefore, the

calibrations of most HVS systems rely on the correspond-

ing point pairs between a perspective image and the cata-

dioptric image. Let us assume that a spatial point P under

the catadioptric camera coordinate system corresponds to a

point

P under the coordinate system of a perspective cam-

era. The transform between this point pair can be deﬁned

as:

P = R

P + T

. (4)

Substituting Eq. 1 and 3 into 4, we have

Z ˆp =

Rp+ T

+ T

, (5)

where R = R

−1

, T = −R

, and T

are homographic

related parameters which need to be estimated. p and ˆp are

projections of the spatial point on the catadioptric and per-

spective image planes. In general, these homographic re-

lated parameters are non-computable since both parameters

Z and α

contain unknown depth information of the spa-

tial point. In the next sections, we give a speciﬁc solution

of estimating these homographic related parameters.

3 Homography of a planar object in

an HVS

Without losing generality, we can assume that most of the

local regions in images represent planer surfaces in a scene.

The homography from catadioptric image to a perspective

image can be modeled using a 3 × 4 matrix as proposed in

[14]:

ˆp =(x, y, 1)

= H

(x, y, x

+ y

−

, 1)

. (6)

The homography from a perspective image to a catadiop-

tric image, which will be used in a registration task, is

a little complex. Suppose that a target object has a pla-

nar surface WP + b =0under the catadioptric coor-

dinate system. Representing this surface using pixels on

the catadioptric image plane deﬁned by Eq. 1, we have

−1

(Rp+ T )+b =0. This is a useful constraint on

the scale factor α

−1

(Rp+ T )

−b

. (7)

Substitute Eq. 7 into Eq. 5, we obtain a 3 × 6 homography

matrix:

p = H

(ˆx

, ˆy

, ˆxˆy, ˆx, ˆy, 1)

. (8)

This homography actually has a similar constraint to Eq. 6.

However, due to the ambiguity when mapping a perspective

image back to the the catadioptric surface, we do not have

a “linear” equation. We can search the homographic ma-

trix 8 under the constraint 6. The algorithm can be brieﬂy

described as:

1. Initialize H

;

2. Compute all the corresponding p

= H

ˆp

;

3. Register p back to the perspective image to obtain ˆp

t+1

;

4. Update H

t+1

= H

+ λcorrelation(ˆp

, ˆp

t+1

);

5. Loop to step 2 until the stop condition is satisﬁed.

4 Patch-level image registration

Estimation of the homography matrix for a planar surface

is a traditional image registration task [6]. The difﬁculty

arises in that the original scene captured by cameras is not

planar as assumed in a 2D approach. To address this prob-

lem, we divide an image from a perspective camera into

small patches B

, and assume that each patch corresponds

to a planar surface in 3D space. Therefore, the image regis-

tration is performed at patch level.

To address this patch level registration, we propose an

algorithm consisting of three main iterative steps: patch

selection, patch registration, and homography propagation,

which is outlined as the following:

Algorithm of robust homography propagation at a

patch level

1. Partition a perspective image into n partitions (patches)

and label all the patches as unregistered;

2. Select an unregistered patch B with the highest variance;

3. Register the patch B using the technique described in the

last section;

4. Propagate the homography of patch B to its unregistered

neighbors, which are located in the same 3D spatial

plane;

5. If there are un-registered patches, go to step 2; else end.

In this algorithm, we partition an image into patches of

the same size. The algorithm iteratively performs from step

2 to step 4 until all the patches are registered. The details of

steps 3 and 4 are discussed as the following.

4.1 Registration in a Haar feature space

Patch registration step 3 is performed in a Haar feature

space. Haar wavelets are chosen since they are able to

model texture in different scales and can be computed very

efﬁciently. The Haar wavelets decompose a given image

patch B into four sub-bands: lower frequency band B

, ver-

tical high frequency band B

, horizontal high frequency

Proceedings of the Seventh IEEE Workshop on Applications of Computer Vision (WACV/MOTION’05)

0-7695-2271-8/05 $ 20.00 IEEE

Figure 3: An image patch and its Haar decomposition.

The four bands in the image on the right correspond to B

(top-left), B

(top-right), B

(bottom-left) and B

(bottom-

right)

band B

, and diagonal high frequency band B

. Figure

3 illustrates a Haar decomposition of a large image patch.

For a point p =(x, y) in the image patch B, its Haar fea-

ture values are deﬁned as:

x,y

2x,2y

+ B

2x,2y+1

+ B

2x+1,2y

+ B

2x+1,2y+1

) ,

x,y

2x,2y

− B

2x,2y+1

+ B

2x+1,2y

− B

2x+1,2y+1

) ,

x,y

2x,2y

+ B

2x,2y+1

− B

2x+1,2y

− B

2x+1,2y+1

) ,

x,y

2x,2y

− B

2x,2y+1

− B

2x+1,2y

+ B

2x+1,2y+1

) .

The registration task is to minimize the following objec-

tive function with respect to the homography matrix H and

global photometric parameters θ =(a

f(H, θ)=



j∈(l,v,h,d)



ˆp∈B

ˆp

− a

− b

)

, (9)

where I

is the Haar feature image from the catadioptric

camera. This minimization involves non-linear constraints

and can be solved by the Levenverg-Marquardt [6, 12] tech-

nique.

4.2 Robust homography propagation

Homography propagation step 4 propagates the homogra-

phy H obtained from the registration of patch B in step

3 to its neighbors. From Eq. 7, we can observe that a

homography matrix contains the factor of W and b asso-

ciated with a spatial plane. Therefore, patches coming from

the same 3D spatial plane should share the same homogra-

phy. Firstly, we use H to initialize a set of homographies

= H

= H and set the seed homography as H

∗

= H.

Then, we register the unregistered patches in the 8 neigh-

bors of the patch B. The seed homography H

∗

is used as an

initialization in the Levenverg-Marquardt based registration

algorithm. The resulted homographies are added into the set

= H

,...,H

. The new seed homography H

∗

≡ h

∗

is then updated as:

∗



k=1



k=1

where β



−m

<δ

0 otherwise

, and the mean m

deﬁned as:

=argmin

= Med

k=1,...,n



− m



The Med denotes the median operator. The variance σ

computed as: σ

=1.48 ×



n−1



. The thresh-

old δ is a tradeoff between precision and robustness of the

registration performance.

We iteratively propagate the registration to the 8 neigh-

bors of each new registered patch until the weights β

all the new registered patches are equal to zero.

This registration algorithm mainly focuses on register-

ing large background planes such as walls and ground ﬂoor.

Small foreground objects are usually not planar enough and

have too low resolution. Therefore the registration results

can be noisier.

5 Experimental results

The proposed approach is evaluated on images obtained

from a catadioptric camera and a perspective camera. Fig. 4

shows two of these images: (a) a top-view image from a Cy-

clovision’s catadioptric camera; (b) a side-view image from

a SONY perspective camera. The catadioptric image has a

resolution of 640 × 480. The resolution of the perspective

image is 800 × 600.

During the patch level registration process, we ﬁrst esti-

mate the translation and the scale parameters and then the

global photometric parameters. Finally, we estimate the ho-

mograghy. In Fig. 5, we display the registration results

for the proposed methods in different patch sizes: (a) im-

age registration results using 2D perspective homography;

(b) image registration results using non-linear homography

with only one partition; (c) image registration results us-

ing non-linear homography with 2 × 2 partitions; (d) image

registration results using non-linear homography with 4 × 4

partitions. The traditional 2D perspective registration does

not work well without pre-calibrating and warping the cata-

dioptric image. The proposed approach gives better reg-

istration results using non-linear homography. Comparing

with the result in (b), the result in (c) illustrates clearly that

Proceedings of the Seventh IEEE Workshop on Applications of Computer Vision (WACV/MOTION’05)

0-7695-2271-8/05 $ 20.00 IEEE

(a)

(b)

Figure 4: Experiment images: (a) a top-view image from

a Cyclovision’s catadioptric camera; (b) a side-view image

from a SONY perspective camera.

there are three 3D spatial “planes” in the scene: the wall

combined with foreground objects, the wall (on the right),

and the ground ﬂoor. When using 16 partitions, the wall

combined with foreground objects part is better registered.

However, the registrations of the patches on the wall (on

the right) become noisy due to not enough texture in some

partitions.

To evaluate the registration results more precisely, we

manually label 60 corresponding points {(p

, ˆp

)} in both

catadioptric image and perspective image, which are shown

in Fig. 6. According to this ground truth, the registration

error is measured by the sum of the translations of the 60

points between the registered coordinates and the coordi-

nates in the ground truth:

E =



i=1

p

− H ˆp

. (10)

Fig. 7 shows the registration errors using different num-

ber of partitions. We can observe that the registration error

decreases as the local patch size decreases.

(a) (b)

Figure 5: Image registration result comparisons: (a) image

registration result using 2D perspective homography; (b)

image registration result using the proposed method with

only 1 partition; (c) image registration results using the pro-

posed method with 4 partitions; (d) image registration re-

sults using the proposed method with 16 partitions.

Figure 6: Corresponding points in the ground truth.

Proceedings of the Seventh IEEE Workshop on Applications of Computer Vision (WACV/MOTION’05)

0-7695-2271-8/05 $ 20.00 IEEE

Image Registration with Uncalibrated Cameras in Hybrid Vision Systems

Figures

Citations

CareMedia: Automated Video and Sensor Analysis for Geriatric Care

Matching of omnidirectional and perspective images using the hybrid fundamental matrix

Multi-view structure-from-motion for hybrid camera scenarios

Calibration method for a central catadioptric-perspective camera system

HOPIS: Hybrid Omnidirectional and Perspective Imaging System for Mobile Robots

References

Numerical Recipes, The Art of Scientific Computing

Practical Methods of Optimization.

Practical Methods of Optimization

Numerical Recipes: The Art of Scientific Computing

Numerical recipes in C. The art of scientific computing

Related Papers (5)

Mixing catadioptric and perspective cameras

Distinctive Image Features from Scale-Invariant Keypoints

Multiple view geometry in computer vision

Single View Point Omnidirectional Camera Calibration from Planar Grids

Epipolar Geometry of Central Projection Systems Using Veronese Maps

Frequently Asked Questions (15)

Q1. What contributions have the authors mentioned in the paper "Image registration with uncalibrated cameras in hybrid vision systems" ?

Q2. What future works have the authors mentioned in the paper "Image registration with uncalibrated cameras in hybrid vision systems" ?

Q3. What is the main drawback of the 2D perspective homography approach?

Q4. What is the definition of the transform between a point pair?

Q5. What is the main advantage of a omnidirectional camera?

Q6. What is the drawback of the 2D approach?

Q7. How many neighbors of each patch are registered?

Q8. What is the name of the workshop?

Q9. What is the focal length of the mirror?

Q10. What is the way to register a perspective image?

Q11. How do the authors compute the homography matrix for a planar surface?

Q12. Why do the authors have a linear equation when mapping a perspective image back to the cata?

Q13. What is the optical signal from point P?

Q14. What is the algorithm of robust homography propagation at a patch level?

Q15. What is the p value of the image patch B?